Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da USP |
Texto Completo: | https://www.teses.usp.br/teses/disponiveis/10/10134/tde-28112019-114932/ |
Resumo: | Toxoplasmosis is a parasitic disease caused by Toxoplasma gondii. Toxoplasma gondii is an intracellular parasite that is related to Plasmodium falciparum, the agent that causes malaria in human. Toxoplasma gondii infects all warm-blooded vertebrates, including mammals and birds. Recent advances in DNA sequencing technologies have made it possible to obtain and use whole genome sequences to genotype any organism, including T. gondii. In the past, PCR-RFLP and MLST are the most common methods to genotype and identify T. gondii and invaluable database is generated over the last decades using these methods. However, the conventional PCR-RFLP and MLST data cannot be easily integrated with the whole genome sequence typing. The objective of this work is to develop a pipeline to map reads coming from a whole genome sequencing to identify SNPs (Single Nucleotide Polymorphisms), and to integrate the data with PCR-RFLP and MLST data. In this work, we used sequencing data from a total of 62 T. gondii isolates from various locations around the world. From these sequences, improved data for phylogenetic analysis were generated using the SplitsTree4 software and population genetics data through the FastStructure tool. In addition, other tools that work in conjunction with the pipeline were developed, making it possible to extract genomic sequences for the 10 PCR-RFLP markers and eight introns for MLST, which were used for genetic analysis of T. gondii in the literature. To make these tools available to the research community; we integrate all software and instruction set used in Perl scripts into a virtual machine, making it possible to perform Bioinformatics tasks from any personal computer, regardless of the operating system running. For this, we use multiplatform virtualization software, VirtualBox. Implement of these tools will facility molecular genetics and population genetics of T. gondii. These tools can be easily modified to work with other organisms as needed. |
id |
USP_5effb3f874d46d42e47b7b14100a2365 |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-28112019-114932 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methodsDesenvolvimento de uma pipeline para identificação de SNPs a partir de sequências genômicas completas de Toxoplasma gondii e sua integração aos métodos convencionais de genotipagemToxoplasma gondiiToxoplasma gondiiBioinformáticaBioinformaticsMáquina virtualPipelinePipelineSequenciamento genômico completoVirtual machineWhole genomic sequencingToxoplasmosis is a parasitic disease caused by Toxoplasma gondii. Toxoplasma gondii is an intracellular parasite that is related to Plasmodium falciparum, the agent that causes malaria in human. Toxoplasma gondii infects all warm-blooded vertebrates, including mammals and birds. Recent advances in DNA sequencing technologies have made it possible to obtain and use whole genome sequences to genotype any organism, including T. gondii. In the past, PCR-RFLP and MLST are the most common methods to genotype and identify T. gondii and invaluable database is generated over the last decades using these methods. However, the conventional PCR-RFLP and MLST data cannot be easily integrated with the whole genome sequence typing. The objective of this work is to develop a pipeline to map reads coming from a whole genome sequencing to identify SNPs (Single Nucleotide Polymorphisms), and to integrate the data with PCR-RFLP and MLST data. In this work, we used sequencing data from a total of 62 T. gondii isolates from various locations around the world. From these sequences, improved data for phylogenetic analysis were generated using the SplitsTree4 software and population genetics data through the FastStructure tool. In addition, other tools that work in conjunction with the pipeline were developed, making it possible to extract genomic sequences for the 10 PCR-RFLP markers and eight introns for MLST, which were used for genetic analysis of T. gondii in the literature. To make these tools available to the research community; we integrate all software and instruction set used in Perl scripts into a virtual machine, making it possible to perform Bioinformatics tasks from any personal computer, regardless of the operating system running. For this, we use multiplatform virtualization software, VirtualBox. Implement of these tools will facility molecular genetics and population genetics of T. gondii. These tools can be easily modified to work with other organisms as needed.A toxoplasmose é uma doença parasitária causada pelo Toxoplasma gondii. O Toxoplasma gondii é um parasita intracelular relacionado ao Plasmodium falciparum, o agente causador da malária em humanos. O Toxoplasma gondii pode infectar todos os vertebrados homeotérmicos, incluindo mamíferos e pássaros. Recentes avanços nas tecnologias de sequenciamento de DNA tornaram possível a obtenção de sequências genômicas completas para praticamente qualquer organismo, incluindo o Toxoplasma gondii e com isso, a tendência é que genotipagens do tipo PCR-RFLP e MLST, atualmente utilizadas, devam ser substituídas. Uma vez que esse inestimável banco de dados gerado ao longo das últimas décadas não pode ser relacionado a essa nova tecnologia, esse trabalho teve como objetivo aliviar esse problema, desenvolvendo uma pipeline, capaz de mapear leituras provenientes de um sequenciamento genômico completo, em plataforma Illumina e identificar SNPs (Single Nucleotide Polymorphisms). Nesse trabalho, foram utilizados dados de sequenciamento de um total de 62 isolados de T. gondii provenientes de vários locais do mundo. A partir dessas sequências, foram gerados dados aprimorados para análise filogenética utilizando o software SplitsTree4 e dados de genética de populações, através da ferramenta FastStructure. Além disso, outras ferramentas que funcionam em conjunto à pipeline foram desenvolvidas, possibilitando também extrair sequências genômicas para os 10 marcadores PCR-RFLP e oito introns, que foram utilizados para análise genética de T. gondii na literatura. Para disponibilizar essas ferramentas para a comunidade de pesquisa, integramos todos os softwares e o conjunto de instruções utilizadas em linguagem Perl, em uma máquina virtual, tornando possível a execução de tarefas de Bioinformática a partir de qualquer computador pessoal, independente do sistema operacional que estiver executando. Para isso, utilizamos um software de virtualização multiplataforma, o VirtualBox.Biblioteca Digitais de Teses e Dissertações da USPGennari, Solange MariaCastro, Bruno Bello Pede2019-07-12info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/10/10134/tde-28112019-114932/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-10-09T13:16:04Zoai:teses.usp.br:tde-28112019-114932Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-10-09T13:16:04Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods Desenvolvimento de uma pipeline para identificação de SNPs a partir de sequências genômicas completas de Toxoplasma gondii e sua integração aos métodos convencionais de genotipagem |
title |
Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods |
spellingShingle |
Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods Castro, Bruno Bello Pede Toxoplasma gondii Toxoplasma gondii Bioinformática Bioinformatics Máquina virtual Pipeline Pipeline Sequenciamento genômico completo Virtual machine Whole genomic sequencing |
title_short |
Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods |
title_full |
Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods |
title_fullStr |
Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods |
title_full_unstemmed |
Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods |
title_sort |
Develop a pipeline to call SNPs from whole genome sequences of Toxoplasma gondii and integrate with conventional genotyping methods |
author |
Castro, Bruno Bello Pede |
author_facet |
Castro, Bruno Bello Pede |
author_role |
author |
dc.contributor.none.fl_str_mv |
Gennari, Solange Maria |
dc.contributor.author.fl_str_mv |
Castro, Bruno Bello Pede |
dc.subject.por.fl_str_mv |
Toxoplasma gondii Toxoplasma gondii Bioinformática Bioinformatics Máquina virtual Pipeline Pipeline Sequenciamento genômico completo Virtual machine Whole genomic sequencing |
topic |
Toxoplasma gondii Toxoplasma gondii Bioinformática Bioinformatics Máquina virtual Pipeline Pipeline Sequenciamento genômico completo Virtual machine Whole genomic sequencing |
description |
Toxoplasmosis is a parasitic disease caused by Toxoplasma gondii. Toxoplasma gondii is an intracellular parasite that is related to Plasmodium falciparum, the agent that causes malaria in human. Toxoplasma gondii infects all warm-blooded vertebrates, including mammals and birds. Recent advances in DNA sequencing technologies have made it possible to obtain and use whole genome sequences to genotype any organism, including T. gondii. In the past, PCR-RFLP and MLST are the most common methods to genotype and identify T. gondii and invaluable database is generated over the last decades using these methods. However, the conventional PCR-RFLP and MLST data cannot be easily integrated with the whole genome sequence typing. The objective of this work is to develop a pipeline to map reads coming from a whole genome sequencing to identify SNPs (Single Nucleotide Polymorphisms), and to integrate the data with PCR-RFLP and MLST data. In this work, we used sequencing data from a total of 62 T. gondii isolates from various locations around the world. From these sequences, improved data for phylogenetic analysis were generated using the SplitsTree4 software and population genetics data through the FastStructure tool. In addition, other tools that work in conjunction with the pipeline were developed, making it possible to extract genomic sequences for the 10 PCR-RFLP markers and eight introns for MLST, which were used for genetic analysis of T. gondii in the literature. To make these tools available to the research community; we integrate all software and instruction set used in Perl scripts into a virtual machine, making it possible to perform Bioinformatics tasks from any personal computer, regardless of the operating system running. For this, we use multiplatform virtualization software, VirtualBox. Implement of these tools will facility molecular genetics and population genetics of T. gondii. These tools can be easily modified to work with other organisms as needed. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-07-12 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/10/10134/tde-28112019-114932/ |
url |
https://www.teses.usp.br/teses/disponiveis/10/10134/tde-28112019-114932/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1815256541368942592 |