Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods

Detalhes bibliográficos
Autor(a) principal: Castro, Bruno Bello Pede
Data de Publicação: 2019
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: https://www.teses.usp.br/teses/disponiveis/10/10134/tde-28112019-114932/
Resumo: Toxoplasmosis is a parasitic disease caused by Toxoplasma gondii. Toxoplasma gondii is an intracellular parasite that is related to Plasmodium falciparum, the agent that causes malaria in human. Toxoplasma gondii infects all warm-blooded vertebrates, including mammals and birds. Recent advances in DNA sequencing technologies have made it possible to obtain and use whole genome sequences to genotype any organism, including T. gondii. In the past, PCR-RFLP and MLST are the most common methods to genotype and identify T. gondii and invaluable database is generated over the last decades using these methods. However, the conventional PCR-RFLP and MLST data cannot be easily integrated with the whole genome sequence typing. The objective of this work is to develop a pipeline to map reads coming from a whole genome sequencing to identify SNPs (Single Nucleotide Polymorphisms), and to integrate the data with PCR-RFLP and MLST data. In this work, we used sequencing data from a total of 62 T. gondii isolates from various locations around the world. From these sequences, improved data for phylogenetic analysis were generated using the SplitsTree4 software and population genetics data through the FastStructure tool. In addition, other tools that work in conjunction with the pipeline were developed, making it possible to extract genomic sequences for the 10 PCR-RFLP markers and eight introns for MLST, which were used for genetic analysis of T. gondii in the literature. To make these tools available to the research community; we integrate all software and instruction set used in Perl scripts into a virtual machine, making it possible to perform Bioinformatics tasks from any personal computer, regardless of the operating system running. For this, we use multiplatform virtualization software, VirtualBox. Implement of these tools will facility molecular genetics and population genetics of T. gondii. These tools can be easily modified to work with other organisms as needed.
id USP_5effb3f874d46d42e47b7b14100a2365
oai_identifier_str oai:teses.usp.br:tde-28112019-114932
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methodsDesenvolvimento de uma pipeline para identificação de SNPs a partir de sequências genômicas completas de Toxoplasma gondii e sua integração aos métodos convencionais de genotipagemToxoplasma gondiiToxoplasma gondiiBioinformáticaBioinformaticsMáquina virtualPipelinePipelineSequenciamento genômico completoVirtual machineWhole genomic sequencingToxoplasmosis is a parasitic disease caused by Toxoplasma gondii. Toxoplasma gondii is an intracellular parasite that is related to Plasmodium falciparum, the agent that causes malaria in human. Toxoplasma gondii infects all warm-blooded vertebrates, including mammals and birds. Recent advances in DNA sequencing technologies have made it possible to obtain and use whole genome sequences to genotype any organism, including T. gondii. In the past, PCR-RFLP and MLST are the most common methods to genotype and identify T. gondii and invaluable database is generated over the last decades using these methods. However, the conventional PCR-RFLP and MLST data cannot be easily integrated with the whole genome sequence typing. The objective of this work is to develop a pipeline to map reads coming from a whole genome sequencing to identify SNPs (Single Nucleotide Polymorphisms), and to integrate the data with PCR-RFLP and MLST data. In this work, we used sequencing data from a total of 62 T. gondii isolates from various locations around the world. From these sequences, improved data for phylogenetic analysis were generated using the SplitsTree4 software and population genetics data through the FastStructure tool. In addition, other tools that work in conjunction with the pipeline were developed, making it possible to extract genomic sequences for the 10 PCR-RFLP markers and eight introns for MLST, which were used for genetic analysis of T. gondii in the literature. To make these tools available to the research community; we integrate all software and instruction set used in Perl scripts into a virtual machine, making it possible to perform Bioinformatics tasks from any personal computer, regardless of the operating system running. For this, we use multiplatform virtualization software, VirtualBox. Implement of these tools will facility molecular genetics and population genetics of T. gondii. These tools can be easily modified to work with other organisms as needed.A toxoplasmose é uma doença parasitária causada pelo Toxoplasma gondii. O Toxoplasma gondii é um parasita intracelular relacionado ao Plasmodium falciparum, o agente causador da malária em humanos. O Toxoplasma gondii pode infectar todos os vertebrados homeotérmicos, incluindo mamíferos e pássaros. Recentes avanços nas tecnologias de sequenciamento de DNA tornaram possível a obtenção de sequências genômicas completas para praticamente qualquer organismo, incluindo o Toxoplasma gondii e com isso, a tendência é que genotipagens do tipo PCR-RFLP e MLST, atualmente utilizadas, devam ser substituídas. Uma vez que esse inestimável banco de dados gerado ao longo das últimas décadas não pode ser relacionado a essa nova tecnologia, esse trabalho teve como objetivo aliviar esse problema, desenvolvendo uma pipeline, capaz de mapear leituras provenientes de um sequenciamento genômico completo, em plataforma Illumina e identificar SNPs (Single Nucleotide Polymorphisms). Nesse trabalho, foram utilizados dados de sequenciamento de um total de 62 isolados de T. gondii provenientes de vários locais do mundo. A partir dessas sequências, foram gerados dados aprimorados para análise filogenética utilizando o software SplitsTree4 e dados de genética de populações, através da ferramenta FastStructure. Além disso, outras ferramentas que funcionam em conjunto à pipeline foram desenvolvidas, possibilitando também extrair sequências genômicas para os 10 marcadores PCR-RFLP e oito introns, que foram utilizados para análise genética de T. gondii na literatura. Para disponibilizar essas ferramentas para a comunidade de pesquisa, integramos todos os softwares e o conjunto de instruções utilizadas em linguagem Perl, em uma máquina virtual, tornando possível a execução de tarefas de Bioinformática a partir de qualquer computador pessoal, independente do sistema operacional que estiver executando. Para isso, utilizamos um software de virtualização multiplataforma, o VirtualBox.Biblioteca Digitais de Teses e Dissertações da USPGennari, Solange MariaCastro, Bruno Bello Pede2019-07-12info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/10/10134/tde-28112019-114932/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-10-09T13:16:04Zoai:teses.usp.br:tde-28112019-114932Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-10-09T13:16:04Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods
Desenvolvimento de uma pipeline para identificação de SNPs a partir de sequências genômicas completas de Toxoplasma gondii e sua integração aos métodos convencionais de genotipagem
title Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods
spellingShingle Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods
Castro, Bruno Bello Pede
Toxoplasma gondii
Toxoplasma gondii
Bioinformática
Bioinformatics
Máquina virtual
Pipeline
Pipeline
Sequenciamento genômico completo
Virtual machine
Whole genomic sequencing
title_short Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods
title_full Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods
title_fullStr Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods
title_full_unstemmed Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods
title_sort Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods
author Castro, Bruno Bello Pede
author_facet Castro, Bruno Bello Pede
author_role author
dc.contributor.none.fl_str_mv Gennari, Solange Maria
dc.contributor.author.fl_str_mv Castro, Bruno Bello Pede
dc.subject.por.fl_str_mv Toxoplasma gondii
Toxoplasma gondii
Bioinformática
Bioinformatics
Máquina virtual
Pipeline
Pipeline
Sequenciamento genômico completo
Virtual machine
Whole genomic sequencing
topic Toxoplasma gondii
Toxoplasma gondii
Bioinformática
Bioinformatics
Máquina virtual
Pipeline
Pipeline
Sequenciamento genômico completo
Virtual machine
Whole genomic sequencing
description Toxoplasmosis is a parasitic disease caused by Toxoplasma gondii. Toxoplasma gondii is an intracellular parasite that is related to Plasmodium falciparum, the agent that causes malaria in human. Toxoplasma gondii infects all warm-blooded vertebrates, including mammals and birds. Recent advances in DNA sequencing technologies have made it possible to obtain and use whole genome sequences to genotype any organism, including T. gondii. In the past, PCR-RFLP and MLST are the most common methods to genotype and identify T. gondii and invaluable database is generated over the last decades using these methods. However, the conventional PCR-RFLP and MLST data cannot be easily integrated with the whole genome sequence typing. The objective of this work is to develop a pipeline to map reads coming from a whole genome sequencing to identify SNPs (Single Nucleotide Polymorphisms), and to integrate the data with PCR-RFLP and MLST data. In this work, we used sequencing data from a total of 62 T. gondii isolates from various locations around the world. From these sequences, improved data for phylogenetic analysis were generated using the SplitsTree4 software and population genetics data through the FastStructure tool. In addition, other tools that work in conjunction with the pipeline were developed, making it possible to extract genomic sequences for the 10 PCR-RFLP markers and eight introns for MLST, which were used for genetic analysis of T. gondii in the literature. To make these tools available to the research community; we integrate all software and instruction set used in Perl scripts into a virtual machine, making it possible to perform Bioinformatics tasks from any personal computer, regardless of the operating system running. For this, we use multiplatform virtualization software, VirtualBox. Implement of these tools will facility molecular genetics and population genetics of T. gondii. These tools can be easily modified to work with other organisms as needed.
publishDate 2019
dc.date.none.fl_str_mv 2019-07-12
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/10/10134/tde-28112019-114932/
url https://www.teses.usp.br/teses/disponiveis/10/10134/tde-28112019-114932/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815256541368942592