Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services
Autor(a) principal: | |
---|---|
Data de Publicação: | 2009 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFSCAR |
Texto Completo: | https://repositorio.ufscar.br/handle/ufscar/6951 |
Resumo: | Pipeline systems for genomic and transcriptomic analysis aim to create communication bridges among the existing analysis tools, therefore reducing researchers efforts. Most of the pipelines found in the literature lack important features which would be useful to the development of genome or transcriptome sequencing projects. Among them, the capacity of tracking the project results along its development, including the generation of partial reports; the presence of a collaborative environment where the involved laboratories can contribute with new data and chromatograms; the possibility to configure analysis parameters; multiple pipeline support and the possibility to include new tools and modules. In this work, a pipeline prototype was developed to overcome these shortcomings. Sequencing projects progresses are tracked along all over their developments. Chromatograms are progressively received along the development of the project and partial reports over newly received data are generated. The communication with the processing server is done via Web service, which offers a universal language interface, allowing client applications in heterogeneous platforms to submit data and execute operations and queries. Pipelines are configured in XML documents written in a predefined format, through which the researchers choose the tools and parameters to be used. The prototype offers support to multiple pipelines executed simultaneously in the same project. Pipelines are executed in parallel by the means of thread pools, what increases efficiency by distributing the workload in multiprocessed systems. Another feature of the prototype is the extensibility as each pipeline step is wrapped in a module. New modules can be easily inserted in the system through the implementation of a programming interface, therefore without the needing of recompilation. Module insertions are done in a declarative way through XML documents. A client application was also developed in the collaborative platform Sakai, allowing different research groups involved in a sequencing project to create pipelines, view results and exchange information on the project current status. To evaluate the efficiency of the prototype, a case study was carried out. Sequences generated from sequencing of Sphenophorus levis transcriptome were submitted and a pipeline was configured to analyze the data. The case study has pointed out that the prototype is efficient and produces good results. |
id |
SCAR_fbb36402899b39b836542108cfad3912 |
---|---|
oai_identifier_str |
oai:repositorio.ufscar.br:ufscar/6951 |
network_acronym_str |
SCAR |
network_name_str |
Repositório Institucional da UFSCAR |
repository_id_str |
4322 |
spelling |
Melo, Henrique Velloso FerreiraSilva, Flávio Henrique dahttp://lattes.cnpq.br/1757309852446263http://lattes.cnpq.br/867269492173900124789bdf-6b89-4f77-bbcc-ac241df174c22016-08-17T18:39:30Z2009-10-202016-08-17T18:39:30Z2009-09-04MELO, Henrique Velloso Ferreira. Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services. 2009. 108 f. Dissertação (Mestrado em Multidisciplinar) - Universidade Federal de São Carlos, São Carlos, 2009.https://repositorio.ufscar.br/handle/ufscar/6951Pipeline systems for genomic and transcriptomic analysis aim to create communication bridges among the existing analysis tools, therefore reducing researchers efforts. Most of the pipelines found in the literature lack important features which would be useful to the development of genome or transcriptome sequencing projects. Among them, the capacity of tracking the project results along its development, including the generation of partial reports; the presence of a collaborative environment where the involved laboratories can contribute with new data and chromatograms; the possibility to configure analysis parameters; multiple pipeline support and the possibility to include new tools and modules. In this work, a pipeline prototype was developed to overcome these shortcomings. Sequencing projects progresses are tracked along all over their developments. Chromatograms are progressively received along the development of the project and partial reports over newly received data are generated. The communication with the processing server is done via Web service, which offers a universal language interface, allowing client applications in heterogeneous platforms to submit data and execute operations and queries. Pipelines are configured in XML documents written in a predefined format, through which the researchers choose the tools and parameters to be used. The prototype offers support to multiple pipelines executed simultaneously in the same project. Pipelines are executed in parallel by the means of thread pools, what increases efficiency by distributing the workload in multiprocessed systems. Another feature of the prototype is the extensibility as each pipeline step is wrapped in a module. New modules can be easily inserted in the system through the implementation of a programming interface, therefore without the needing of recompilation. Module insertions are done in a declarative way through XML documents. A client application was also developed in the collaborative platform Sakai, allowing different research groups involved in a sequencing project to create pipelines, view results and exchange information on the project current status. To evaluate the efficiency of the prototype, a case study was carried out. Sequences generated from sequencing of Sphenophorus levis transcriptome were submitted and a pipeline was configured to analyze the data. The case study has pointed out that the prototype is efficient and produces good results.Sistemas de pipeline para análise de genomas e transcriptomas têm o objetivo de criar pontes de comunicação entre as diferentes ferramentas no intuito de reduzir os esforços do pesquisador no processo de análise. A maioria dos pipelines descritos na literatura carece de funcionalidades importantes para o desenvolvimento de projetos de sequenciamento. Entre elas, a capacidade de acompanhar e gerar resultados parciais das análises ao longo do desenvolvimento do projeto; a presença de um ambiente colaborativo onde os diferentes laboratórios envolvidos possam contribuir com novos dados e cromatogramas; a possibilidade da configuração dos parâmetros da análise; o suporte a múltiplos pipelines com diferentes configurações; e o suporte à inclusão de novos programas e módulos. Neste trabalho, foi desenvolvido um protótipo que supre essas deficiências. O progresso dos projetos é acompanhado ao longo de todo o seu desenvolvimento. Para isso, recebe dados brutos de cromatogramas, realiza análises dos dados parciais e emite relatórios com os resultados. A comunicação com o servidor de processamento é realizada via Web service, oferecendo uma interface na linguagem universal XML que permite que aplicações cliente em plataformas heterogêneas submetam dados e realizem operações e consultas. Os pipelines são configurados através de arquivos XML em formato específico, no qual o pesquisador define os programas a parâmetros a utilizar. O protótipo dá suporte a múltiplos pipelines com execução simultânea em um mesmo projeto. A execução dos pipelines é realizada em paralelo por meio de um pool de threads, o que aumenta a eficiência dividindo a carga de processamento em servidores com mais de um núcleo. Uma aplicação cliente foi desenvolvida na plataforma colaborativa, permitindo que os diferentes grupos de pesquisa envolvidos no sequenciamento criem pipelines, visualizem resultados e troquem informações sobre o andamento do projeto. Outro diferencial do protótipo desenvolvido é a extensibilidade. Cada etapa do pipeline é encapsulada em um módulo. Novos módulos podem ser facilmente inseridos sem a necessidade de recompilação de todo o sistema, bastando para isso que o mesmo implemente uma interface específica. A inserção no sistema é realizada declarativamente em arquivos XML. Um estudo de caso foi realizado com a submissão de cromatogramas a partir do sequenciamento de ESTs (Expressed Sequence Tags) de Sphenophorus Levis. Um pipeline foi configurado para o estudo, e sua execução mostrou que o sistema é eficiente e apresenta bons resultados.application/pdfporUniversidade Federal de São CarlosPrograma de Pós-Graduação em Biotecnologia - PPGBiotecUFSCarBRBioinformáticaAnálise genômicaSequenciamentoAnálise transcriptômicaPipelineGenomic analysisTranscriptomic analysisWeb serviceOUTROSDesenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web servicesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesis-1-1e2c04fa9-1e62-4316-915c-35a38d859aaeinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINAL2590.pdfapplication/pdf1752867https://repositorio.ufscar.br/bitstream/ufscar/6951/1/2590.pdf7dd2196f0a9d489b35a0759a6cc018c6MD51TEXT2590.pdf.txt2590.pdf.txtExtracted texttext/plain195945https://repositorio.ufscar.br/bitstream/ufscar/6951/2/2590.pdf.txt5a7f8beaaadd5bddc5a6dd6bf160b523MD52THUMBNAIL2590.pdf.jpg2590.pdf.jpgIM Thumbnailimage/jpeg6454https://repositorio.ufscar.br/bitstream/ufscar/6951/3/2590.pdf.jpgdfe907f93cb616a6b2bdcc556ee26f27MD53ufscar/69512023-09-18 18:30:33.044oai:repositorio.ufscar.br:ufscar/6951Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:30:33Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false |
dc.title.por.fl_str_mv |
Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services |
title |
Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services |
spellingShingle |
Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services Melo, Henrique Velloso Ferreira Bioinformática Análise genômica Sequenciamento Análise transcriptômica Pipeline Genomic analysis Transcriptomic analysis Web service OUTROS |
title_short |
Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services |
title_full |
Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services |
title_fullStr |
Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services |
title_full_unstemmed |
Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services |
title_sort |
Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services |
author |
Melo, Henrique Velloso Ferreira |
author_facet |
Melo, Henrique Velloso Ferreira |
author_role |
author |
dc.contributor.authorlattes.por.fl_str_mv |
http://lattes.cnpq.br/8672694921739001 |
dc.contributor.author.fl_str_mv |
Melo, Henrique Velloso Ferreira |
dc.contributor.advisor1.fl_str_mv |
Silva, Flávio Henrique da |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/1757309852446263 |
dc.contributor.authorID.fl_str_mv |
24789bdf-6b89-4f77-bbcc-ac241df174c2 |
contributor_str_mv |
Silva, Flávio Henrique da |
dc.subject.por.fl_str_mv |
Bioinformática Análise genômica Sequenciamento Análise transcriptômica Pipeline |
topic |
Bioinformática Análise genômica Sequenciamento Análise transcriptômica Pipeline Genomic analysis Transcriptomic analysis Web service OUTROS |
dc.subject.eng.fl_str_mv |
Genomic analysis Transcriptomic analysis Web service |
dc.subject.cnpq.fl_str_mv |
OUTROS |
description |
Pipeline systems for genomic and transcriptomic analysis aim to create communication bridges among the existing analysis tools, therefore reducing researchers efforts. Most of the pipelines found in the literature lack important features which would be useful to the development of genome or transcriptome sequencing projects. Among them, the capacity of tracking the project results along its development, including the generation of partial reports; the presence of a collaborative environment where the involved laboratories can contribute with new data and chromatograms; the possibility to configure analysis parameters; multiple pipeline support and the possibility to include new tools and modules. In this work, a pipeline prototype was developed to overcome these shortcomings. Sequencing projects progresses are tracked along all over their developments. Chromatograms are progressively received along the development of the project and partial reports over newly received data are generated. The communication with the processing server is done via Web service, which offers a universal language interface, allowing client applications in heterogeneous platforms to submit data and execute operations and queries. Pipelines are configured in XML documents written in a predefined format, through which the researchers choose the tools and parameters to be used. The prototype offers support to multiple pipelines executed simultaneously in the same project. Pipelines are executed in parallel by the means of thread pools, what increases efficiency by distributing the workload in multiprocessed systems. Another feature of the prototype is the extensibility as each pipeline step is wrapped in a module. New modules can be easily inserted in the system through the implementation of a programming interface, therefore without the needing of recompilation. Module insertions are done in a declarative way through XML documents. A client application was also developed in the collaborative platform Sakai, allowing different research groups involved in a sequencing project to create pipelines, view results and exchange information on the project current status. To evaluate the efficiency of the prototype, a case study was carried out. Sequences generated from sequencing of Sphenophorus levis transcriptome were submitted and a pipeline was configured to analyze the data. The case study has pointed out that the prototype is efficient and produces good results. |
publishDate |
2009 |
dc.date.available.fl_str_mv |
2009-10-20 2016-08-17T18:39:30Z |
dc.date.issued.fl_str_mv |
2009-09-04 |
dc.date.accessioned.fl_str_mv |
2016-08-17T18:39:30Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
MELO, Henrique Velloso Ferreira. Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services. 2009. 108 f. Dissertação (Mestrado em Multidisciplinar) - Universidade Federal de São Carlos, São Carlos, 2009. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufscar.br/handle/ufscar/6951 |
identifier_str_mv |
MELO, Henrique Velloso Ferreira. Desenvolvimento de um pipeline para análise genômica e transcriptômica com base em Web services. 2009. 108 f. Dissertação (Mestrado em Multidisciplinar) - Universidade Federal de São Carlos, São Carlos, 2009. |
url |
https://repositorio.ufscar.br/handle/ufscar/6951 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.confidence.fl_str_mv |
-1 -1 |
dc.relation.authority.fl_str_mv |
e2c04fa9-1e62-4316-915c-35a38d859aae |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Federal de São Carlos |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Biotecnologia - PPGBiotec |
dc.publisher.initials.fl_str_mv |
UFSCar |
dc.publisher.country.fl_str_mv |
BR |
publisher.none.fl_str_mv |
Universidade Federal de São Carlos |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR |
instname_str |
Universidade Federal de São Carlos (UFSCAR) |
instacron_str |
UFSCAR |
institution |
UFSCAR |
reponame_str |
Repositório Institucional da UFSCAR |
collection |
Repositório Institucional da UFSCAR |
bitstream.url.fl_str_mv |
https://repositorio.ufscar.br/bitstream/ufscar/6951/1/2590.pdf https://repositorio.ufscar.br/bitstream/ufscar/6951/2/2590.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/6951/3/2590.pdf.jpg |
bitstream.checksum.fl_str_mv |
7dd2196f0a9d489b35a0759a6cc018c6 5a7f8beaaadd5bddc5a6dd6bf160b523 dfe907f93cb616a6b2bdcc556ee26f27 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR) |
repository.mail.fl_str_mv |
|
_version_ |
1813715555191881728 |