Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas

Andrighetti, Tahila [UNESP]

Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas

Detalhes bibliográficos
Autor(a) principal:	Andrighetti, Tahila [UNESP]
Data de Publicação:	2015
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://hdl.handle.net/11449/132017 http://www.athena.biblioteca.unesp.br/exlibris/bd/cathedra/11-11-2015/000851881.pdf
Resumo:	Microbial communities play a crucial role in all ecosystems on Earth since they metabolize essential compounds. Given this relevant role they are investigated in Medicine, Biotechnology, Ecology, Food Sciences among other fields. However, only 1% of all known micro-organisms species can be cultivated in vitro. The unravelling of their functions and taxonomic classification demands the development of new approaches. With the advent of new sequencing strategies, the entire genome of microrganisms on a given habitat can be experimentally extracted, but the fragments obtained are small (<1500 bps), and the data processing remains a huge challenge. The most used metagenomic analysis tools classify the sequences by homology. However, the computational time grows exponentially as the read length decreases. There is an evident need for alternative methods that can analyze metagenomic data quickly and accurately. This study proposes a new bacteria sequences identification method to be used in metagenomic data. The genomes of 2164 bacterial strains were obtained from the GenBank and distributed into test and control sets. Each group was randomly fragmented into sequences of 64, 128, 256, 512, 1024, 2048, and 4096 base pair. The sequences organization measures applied in the reads were: GC content, dinucleotide abundance and diplets, triplets and tetraplets entropy. The average and standard deviation of the control sequences values of each species, genus and families of bacteria were calculated. Combinations of genomic signatures and entropy were performed allowing classifying bacteria sequences into family, genus and species. The performance of the proposed methodology was determined by measuring sensitivity, specificity, accuracy and harmonic mean for the test set. The results indicated that the GC content presented the best performance among the signatures investigated. We also considered combinations of features, the combination considering GC ...

Metadados do item

id	UNSP_51b277c45bbb4d520699fe7efe41ab14
oai_identifier_str	oai:repositorio.unesp.br:11449/132017
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicasNucleotídeosBioinformáticaEntropiaGenoma humanoMicro-organismosSeqüenciamento de nucleotídeoNucleotide sequenceMicrobial communities play a crucial role in all ecosystems on Earth since they metabolize essential compounds. Given this relevant role they are investigated in Medicine, Biotechnology, Ecology, Food Sciences among other fields. However, only 1% of all known micro-organisms species can be cultivated in vitro. The unravelling of their functions and taxonomic classification demands the development of new approaches. With the advent of new sequencing strategies, the entire genome of microrganisms on a given habitat can be experimentally extracted, but the fragments obtained are small (<1500 bps), and the data processing remains a huge challenge. The most used metagenomic analysis tools classify the sequences by homology. However, the computational time grows exponentially as the read length decreases. There is an evident need for alternative methods that can analyze metagenomic data quickly and accurately. This study proposes a new bacteria sequences identification method to be used in metagenomic data. The genomes of 2164 bacterial strains were obtained from the GenBank and distributed into test and control sets. Each group was randomly fragmented into sequences of 64, 128, 256, 512, 1024, 2048, and 4096 base pair. The sequences organization measures applied in the reads were: GC content, dinucleotide abundance and diplets, triplets and tetraplets entropy. The average and standard deviation of the control sequences values of each species, genus and families of bacteria were calculated. Combinations of genomic signatures and entropy were performed allowing classifying bacteria sequences into family, genus and species. The performance of the proposed methodology was determined by measuring sensitivity, specificity, accuracy and harmonic mean for the test set. The results indicated that the GC content presented the best performance among the signatures investigated. We also considered combinations of features, the combination considering GC ...Comunidades microbianas desempenham papéis cruciais em todos ecosistemas da Terra, uma vez que metabolizam compostos essenciais. Essa característica torna importantes alvos de pesquisas em diversas áreas como médica, ambiental, alimentícia e biotecnológica. Entretanto, somente 1% de todas espécies de micro-organismos conhecidos podem ser cultivadas in vitro, dificultando o estudo de suas funções e de sua classificação taxonômica. Com o surgimento de novas tecnologias de sequenciamento, o genoma inteiro de micro-organismos de um habitat pode ser experimentalmente extraído, mas em pequenos fragmentos (¡1500 pb), tornando o processamento dos dados um grande desafio. As ferramentas de análise de metagenômica mais utilizadas classificam as sequências por homologia. Entretanto, o tempo computacional aumenta exponencialmente conforme o tamanho dos fragmentos diminuem. Isso mostra uma necessidade evidente de métodos alternativos que possam analisar dados de metagenômica de maneira rápida e precisa. Esse estudo propõe um novo método de identificação de sequências de bactérias que analisa esses dados. Os genomas de 2164 linhagens de bactérias foram obtidos pelo GenBank e fragmentados em grupos de teste e controle. Cada grupo foi aleatóriamente fragmentado em sequências de 64, 128, 256, 512, 1024, 2048 e 4096 pares de base. As medidas de organização de sequências aplicadas nos fragmentos foram: conteúdo GC, abundância de dinucleotídeos e entropias de dipletes, tripletes e tetrapletes. Foram calculados a média e o desvio padrão dos valores das sequências controle para cada espécie, gênero e família de bactéria. Foram feitas combinações de medidas para classificar as sequências em famílias, gêneros e espécies. A performance da metodologia foi determinada por medidas de sensibilidade, especificidade, precição e média harmônica para conjuntos de...Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)FAPESP: 2013/1517-4Universidade Estadual Paulista (Unesp)Rybarczyk Filho, José Luiz [UNESP]Lemke, Ney [UNESP]Universidade Estadual Paulista (Unesp)Andrighetti, Tahila [UNESP]2015-12-10T14:23:05Z2015-12-10T14:23:05Z2015-02-27info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis53 f.application/pdfANDRIGHETTI, Tahila. Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas. 2015. 53 f. Dissertação (mestrado) - Universidade Estadual Paulista Júlio de Mesquita Filho, Instituto de Biociências de Botucatu, 2015.http://hdl.handle.net/11449/132017000851881http://www.athena.biblioteca.unesp.br/exlibris/bd/cathedra/11-11-2015/000851881.pdf33004064026P97977035910952141Alephreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPporinfo:eu-repo/semantics/openAccess2024-01-08T06:28:33Zoai:repositorio.unesp.br:11449/132017Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T22:28:24.973581Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas
title	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas
spellingShingle	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas Andrighetti, Tahila [UNESP] Nucleotídeos Bioinformática Entropia Genoma humano Micro-organismos Seqüenciamento de nucleotídeo Nucleotide sequence
title_short	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas
title_full	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas
title_fullStr	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas
title_full_unstemmed	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas
title_sort	Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas
author	Andrighetti, Tahila [UNESP]
author_facet	Andrighetti, Tahila [UNESP]
author_role	author
dc.contributor.none.fl_str_mv	Rybarczyk Filho, José Luiz [UNESP] Lemke, Ney [UNESP] Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv	Andrighetti, Tahila [UNESP]
dc.subject.por.fl_str_mv	Nucleotídeos Bioinformática Entropia Genoma humano Micro-organismos Seqüenciamento de nucleotídeo Nucleotide sequence
topic	Nucleotídeos Bioinformática Entropia Genoma humano Micro-organismos Seqüenciamento de nucleotídeo Nucleotide sequence
description	Microbial communities play a crucial role in all ecosystems on Earth since they metabolize essential compounds. Given this relevant role they are investigated in Medicine, Biotechnology, Ecology, Food Sciences among other fields. However, only 1% of all known micro-organisms species can be cultivated in vitro. The unravelling of their functions and taxonomic classification demands the development of new approaches. With the advent of new sequencing strategies, the entire genome of microrganisms on a given habitat can be experimentally extracted, but the fragments obtained are small (<1500 bps), and the data processing remains a huge challenge. The most used metagenomic analysis tools classify the sequences by homology. However, the computational time grows exponentially as the read length decreases. There is an evident need for alternative methods that can analyze metagenomic data quickly and accurately. This study proposes a new bacteria sequences identification method to be used in metagenomic data. The genomes of 2164 bacterial strains were obtained from the GenBank and distributed into test and control sets. Each group was randomly fragmented into sequences of 64, 128, 256, 512, 1024, 2048, and 4096 base pair. The sequences organization measures applied in the reads were: GC content, dinucleotide abundance and diplets, triplets and tetraplets entropy. The average and standard deviation of the control sequences values of each species, genus and families of bacteria were calculated. Combinations of genomic signatures and entropy were performed allowing classifying bacteria sequences into family, genus and species. The performance of the proposed methodology was determined by measuring sensitivity, specificity, accuracy and harmonic mean for the test set. The results indicated that the GC content presented the best performance among the signatures investigated. We also considered combinations of features, the combination considering GC ...
publishDate	2015
dc.date.none.fl_str_mv	2015-12-10T14:23:05Z 2015-12-10T14:23:05Z 2015-02-27
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	ANDRIGHETTI, Tahila. Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas. 2015. 53 f. Dissertação (mestrado) - Universidade Estadual Paulista Júlio de Mesquita Filho, Instituto de Biociências de Botucatu, 2015. http://hdl.handle.net/11449/132017 000851881 http://www.athena.biblioteca.unesp.br/exlibris/bd/cathedra/11-11-2015/000851881.pdf 33004064026P9 7977035910952141
identifier_str_mv	ANDRIGHETTI, Tahila. Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas. 2015. 53 f. Dissertação (mestrado) - Universidade Estadual Paulista Júlio de Mesquita Filho, Instituto de Biociências de Botucatu, 2015. 000851881 33004064026P9 7977035910952141
url	http://hdl.handle.net/11449/132017 http://www.athena.biblioteca.unesp.br/exlibris/bd/cathedra/11-11-2015/000851881.pdf
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	53 f. application/pdf
dc.publisher.none.fl_str_mv	Universidade Estadual Paulista (Unesp)
publisher.none.fl_str_mv	Universidade Estadual Paulista (Unesp)
dc.source.none.fl_str_mv	Aleph reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1808129429528903680

Ferramenta computacional para identificação de micro-organismos com base em assinaturas genômicas

Registros relacionados