ScreenVar - a biclustering-based methodology for evaluating structural variants

Detalhes bibliográficos
Autor(a) principal: NASCIMENTO JÚNIOR, Francisco do
Data de Publicação: 2017
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da UFPE
Texto Completo: https://repositorio.ufpe.br/handle/123456789/25375
Resumo: The importance of structural variants as a source of phenotypic variation has grown in recent years. At the same time, the number of tools that detect structural variations using Next- Generation Sequencing (NGS) has increased considerably with the dramatic drop in the cost of sequencing in last ten years. Then evaluating properly the detected structural variants has been featured prominently due to the uncertainty of such alterations, bringing important implications for researchers and clinicians on scrutinizing thoroughly the human genome. These trends have raised interest about careful procedures for assessing the outcomes from variant calling tools. Here, we characterize the relevant technical details of the detection of structural variants, which can affect the accuracy of detection methods and also we discuss the most important caveats related to the tool evaluation process. This study emphasizes common assumptions, a variety of possible limitations, and valuable insights extracted from the state-of-the-art in CNV (Copy Number Variation) detection tools. Among such points, a frequently mentioned and extremely important is the lack of a gold standard of structural variants, and its impact on the evaluation of existing detection tools. Next, this document describes a biclustering-based methodology to screen a collection of structural variants and provide a set of reliable events, based on a defined equivalence criterion, that is supported by different studies. Finally, we carry out experiments with the proposed methodology using as input data the Database of Genomic Variants (DGV). We found relevant groups of equivalent variants across different studies. In summary, this thesis shows that there is an alternative approach to solving the open problem of the lack of gold standard for evaluating structural variants.
id UFPE_765bdb943b3fe864eec3669a61f2860f
oai_identifier_str oai:repositorio.ufpe.br:123456789/25375
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str 2221
spelling NASCIMENTO JÚNIOR, Francisco dohttp://lattes.cnpq.br/6683834339342079http://lattes.cnpq.br/8994178236264483GUIMARÃES, Katia Silva2018-08-03T19:38:31Z2018-08-03T19:38:31Z2017-02-17https://repositorio.ufpe.br/handle/123456789/25375The importance of structural variants as a source of phenotypic variation has grown in recent years. At the same time, the number of tools that detect structural variations using Next- Generation Sequencing (NGS) has increased considerably with the dramatic drop in the cost of sequencing in last ten years. Then evaluating properly the detected structural variants has been featured prominently due to the uncertainty of such alterations, bringing important implications for researchers and clinicians on scrutinizing thoroughly the human genome. These trends have raised interest about careful procedures for assessing the outcomes from variant calling tools. Here, we characterize the relevant technical details of the detection of structural variants, which can affect the accuracy of detection methods and also we discuss the most important caveats related to the tool evaluation process. This study emphasizes common assumptions, a variety of possible limitations, and valuable insights extracted from the state-of-the-art in CNV (Copy Number Variation) detection tools. Among such points, a frequently mentioned and extremely important is the lack of a gold standard of structural variants, and its impact on the evaluation of existing detection tools. Next, this document describes a biclustering-based methodology to screen a collection of structural variants and provide a set of reliable events, based on a defined equivalence criterion, that is supported by different studies. Finally, we carry out experiments with the proposed methodology using as input data the Database of Genomic Variants (DGV). We found relevant groups of equivalent variants across different studies. In summary, this thesis shows that there is an alternative approach to solving the open problem of the lack of gold standard for evaluating structural variants.CAPESA importância das variantes estruturais como fonte de variação fenotípica tem se proliferado nos últimos anos. Ao mesmo tempo, o número de ferramentas que detectam variações estruturais usando Next-Generation Sequencing (NGS) aumentou consideravelmente com a dramática queda no custo de seqüenciamento nos últimos dez anos. Neste cenário, avaliar corretamente as variantes estruturais detectadas tem recebido destaque proeminente devido à incerteza de tais alterações, trazendo implicações importantes para os pesquisadores e clínicos no exame minucioso do genoma humano. Essas tendências têm impulsionado o interesse em procedimentos criteriosos para avaliar os variantes identificados. Inicialmente, caracterizamos os detalhes técnicos relevantes em torno da detecção de variantes estruturais, os quais podem afetar a precisão. Além disso, apresentamos advertências fundamentais relacionadas ao processo de avaliação de uma ferramenta. Desta forma, este estudo enfatiza questões como suposições comuns à maioria das ferramentas, juntamente com limitações e vantagens extraídas do estadoda- arte em ferramentas de detecção de variantes estruturais. Entre esses pontos, há uma muito questão bastante citada que é a falta de um gold standard de variantes estruturais, e como sua ausência impacta na avaliação das ferramentas de detecção existentes. Em seguida, este documento descreve uma metodologia baseada em biclustering para pesquisar uma coleção de variantes estruturais e fornecer um conjunto de eventos confiáveis, com base em um critério de equivalência definido e apoiado por diferentes estudos. Finalmente, realizamos experimentos com essa metodologia usando o Database of Genomic Variants (DGV) como dados de entrada e encontramos grupos relevantes de variantes equivalentes em diferentes estudos. Desta forma, esta tese mostra que existe uma abordagem alternativa para o problema em aberto da falta de gold standard para avaliar variantes estruturais.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessCiência da computaçãoBiologia computacionalScreenVar - a biclustering-based methodology for evaluating structural variantsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisdoutoradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPETHUMBNAILTESE Francisco do Nascimento Junior.pdf.jpgTESE Francisco do Nascimento Junior.pdf.jpgGenerated Thumbnailimage/jpeg1247https://repositorio.ufpe.br/bitstream/123456789/25375/5/TESE%20Francisco%20do%20Nascimento%20Junior.pdf.jpg960786ed718ed5c8c10ccc28ddc507fcMD55ORIGINALTESE Francisco do Nascimento Junior.pdfTESE Francisco do Nascimento Junior.pdfapplication/pdf1104753https://repositorio.ufpe.br/bitstream/123456789/25375/1/TESE%20Francisco%20do%20Nascimento%20Junior.pdf794ee127f9a27d065eb71104d4849c0eMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/25375/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82311https://repositorio.ufpe.br/bitstream/123456789/25375/3/license.txt4b8a02c7f2818eaf00dcf2260dd5eb08MD53TEXTTESE Francisco do Nascimento Junior.pdf.txtTESE Francisco do Nascimento Junior.pdf.txtExtracted texttext/plain267872https://repositorio.ufpe.br/bitstream/123456789/25375/4/TESE%20Francisco%20do%20Nascimento%20Junior.pdf.txtbb447788a3d2b21929aa469002806c76MD54123456789/253752019-10-25 09:06:47.532oai:repositorio.ufpe.br:123456789/25375TGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKClRvZG8gZGVwb3NpdGFudGUgZGUgbWF0ZXJpYWwgbm8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgKFJJKSBkZXZlIGNvbmNlZGVyLCDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIChVRlBFKSwgdW1hIExpY2Vuw6dhIGRlIERpc3RyaWJ1acOnw6NvIE7Do28gRXhjbHVzaXZhIHBhcmEgbWFudGVyIGUgdG9ybmFyIGFjZXNzw612ZWlzIG9zIHNldXMgZG9jdW1lbnRvcywgZW0gZm9ybWF0byBkaWdpdGFsLCBuZXN0ZSByZXBvc2l0w7NyaW8uCgpDb20gYSBjb25jZXNzw6NvIGRlc3RhIGxpY2Vuw6dhIG7Do28gZXhjbHVzaXZhLCBvIGRlcG9zaXRhbnRlIG1hbnTDqW0gdG9kb3Mgb3MgZGlyZWl0b3MgZGUgYXV0b3IuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwoKTGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKCkFvIGNvbmNvcmRhciBjb20gZXN0YSBsaWNlbsOnYSBlIGFjZWl0w6EtbGEsIHZvY8OqIChhdXRvciBvdSBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMpOgoKYSkgRGVjbGFyYSBxdWUgY29uaGVjZSBhIHBvbMOtdGljYSBkZSBjb3B5cmlnaHQgZGEgZWRpdG9yYSBkbyBzZXUgZG9jdW1lbnRvOwpiKSBEZWNsYXJhIHF1ZSBjb25oZWNlIGUgYWNlaXRhIGFzIERpcmV0cml6ZXMgcGFyYSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGUEU7CmMpIENvbmNlZGUgw6AgVUZQRSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZGUgYXJxdWl2YXIsIHJlcHJvZHV6aXIsIGNvbnZlcnRlciAoY29tbyBkZWZpbmlkbyBhIHNlZ3VpciksIGNvbXVuaWNhciBlL291IGRpc3RyaWJ1aXIsIG5vIFJJLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgcG9yIG91dHJvIG1laW87CmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgVUZQRSBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgcGFyYSBxdWFscXVlciBmb3JtYXRvIGRlIGZpY2hlaXJvLCBtZWlvIG91IHN1cG9ydGUsIHBhcmEgZWZlaXRvcyBkZSBzZWd1cmFuw6dhLCBwcmVzZXJ2YcOnw6NvIChiYWNrdXApIGUgYWNlc3NvOwplKSBEZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gw6kgbyBzZXUgdHJhYmFsaG8gb3JpZ2luYWwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2Ugb3MgZGlyZWl0b3MgZGUgb3V0cmEgcGVzc29hIG91IGVudGlkYWRlOwpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlCmF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIGlycmVzdHJpdGEgZG8gcmVzcGVjdGl2byBkZXRlbnRvciBkZXNzZXMgZGlyZWl0b3MgcGFyYSBjZWRlciDDoApVRlBFIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgTGljZW7Dp2EgZSBhdXRvcml6YXIgYSB1bml2ZXJzaWRhZGUgYSB1dGlsaXrDoS1sb3MgbGVnYWxtZW50ZS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zIGRpcmVpdG9zIHPDo28gZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZTsKZykgU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8gcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVRlBFLMKgZGVjbGFyYSBxdWUgY3VtcHJpdSBxdWFpc3F1ZXIgb2JyaWdhw6fDtWVzIGV4aWdpZGFzIHBlbG8gcmVzcGVjdGl2byBjb250cmF0byBvdSBhY29yZG8uCgpBIFVGUEUgaWRlbnRpZmljYXLDoSBjbGFyYW1lbnRlIG8ocykgbm9tZShzKSBkbyhzKSBhdXRvciAoZXMpIGRvcyBkaXJlaXRvcyBkbyBkb2N1bWVudG8gZW50cmVndWUgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBwYXJhIGFsw6ltIGRvIHByZXZpc3RvIG5hIGFsw61uZWEgYykuCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212019-10-25T12:06:47Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv ScreenVar - a biclustering-based methodology for evaluating structural variants
title ScreenVar - a biclustering-based methodology for evaluating structural variants
spellingShingle ScreenVar - a biclustering-based methodology for evaluating structural variants
NASCIMENTO JÚNIOR, Francisco do
Ciência da computação
Biologia computacional
title_short ScreenVar - a biclustering-based methodology for evaluating structural variants
title_full ScreenVar - a biclustering-based methodology for evaluating structural variants
title_fullStr ScreenVar - a biclustering-based methodology for evaluating structural variants
title_full_unstemmed ScreenVar - a biclustering-based methodology for evaluating structural variants
title_sort ScreenVar - a biclustering-based methodology for evaluating structural variants
author NASCIMENTO JÚNIOR, Francisco do
author_facet NASCIMENTO JÚNIOR, Francisco do
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/6683834339342079
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/8994178236264483
dc.contributor.author.fl_str_mv NASCIMENTO JÚNIOR, Francisco do
dc.contributor.advisor1.fl_str_mv GUIMARÃES, Katia Silva
contributor_str_mv GUIMARÃES, Katia Silva
dc.subject.por.fl_str_mv Ciência da computação
Biologia computacional
topic Ciência da computação
Biologia computacional
description The importance of structural variants as a source of phenotypic variation has grown in recent years. At the same time, the number of tools that detect structural variations using Next- Generation Sequencing (NGS) has increased considerably with the dramatic drop in the cost of sequencing in last ten years. Then evaluating properly the detected structural variants has been featured prominently due to the uncertainty of such alterations, bringing important implications for researchers and clinicians on scrutinizing thoroughly the human genome. These trends have raised interest about careful procedures for assessing the outcomes from variant calling tools. Here, we characterize the relevant technical details of the detection of structural variants, which can affect the accuracy of detection methods and also we discuss the most important caveats related to the tool evaluation process. This study emphasizes common assumptions, a variety of possible limitations, and valuable insights extracted from the state-of-the-art in CNV (Copy Number Variation) detection tools. Among such points, a frequently mentioned and extremely important is the lack of a gold standard of structural variants, and its impact on the evaluation of existing detection tools. Next, this document describes a biclustering-based methodology to screen a collection of structural variants and provide a set of reliable events, based on a defined equivalence criterion, that is supported by different studies. Finally, we carry out experiments with the proposed methodology using as input data the Database of Genomic Variants (DGV). We found relevant groups of equivalent variants across different studies. In summary, this thesis shows that there is an alternative approach to solving the open problem of the lack of gold standard for evaluating structural variants.
publishDate 2017
dc.date.issued.fl_str_mv 2017-02-17
dc.date.accessioned.fl_str_mv 2018-08-03T19:38:31Z
dc.date.available.fl_str_mv 2018-08-03T19:38:31Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio.ufpe.br/handle/123456789/25375
url https://repositorio.ufpe.br/handle/123456789/25375
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv Programa de Pos Graduacao em Ciencia da Computacao
dc.publisher.initials.fl_str_mv UFPE
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
bitstream.url.fl_str_mv https://repositorio.ufpe.br/bitstream/123456789/25375/5/TESE%20Francisco%20do%20Nascimento%20Junior.pdf.jpg
https://repositorio.ufpe.br/bitstream/123456789/25375/1/TESE%20Francisco%20do%20Nascimento%20Junior.pdf
https://repositorio.ufpe.br/bitstream/123456789/25375/2/license_rdf
https://repositorio.ufpe.br/bitstream/123456789/25375/3/license.txt
https://repositorio.ufpe.br/bitstream/123456789/25375/4/TESE%20Francisco%20do%20Nascimento%20Junior.pdf.txt
bitstream.checksum.fl_str_mv 960786ed718ed5c8c10ccc28ddc507fc
794ee127f9a27d065eb71104d4849c0e
e39d27027a6cc9cb039ad269a5db8e34
4b8a02c7f2818eaf00dcf2260dd5eb08
bb447788a3d2b21929aa469002806c76
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1802310720102072320