ScreenVar - a biclustering-based methodology for evaluating structural variants
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFPE |
Texto Completo: | https://repositorio.ufpe.br/handle/123456789/25375 |
Resumo: | The importance of structural variants as a source of phenotypic variation has grown in recent years. At the same time, the number of tools that detect structural variations using Next- Generation Sequencing (NGS) has increased considerably with the dramatic drop in the cost of sequencing in last ten years. Then evaluating properly the detected structural variants has been featured prominently due to the uncertainty of such alterations, bringing important implications for researchers and clinicians on scrutinizing thoroughly the human genome. These trends have raised interest about careful procedures for assessing the outcomes from variant calling tools. Here, we characterize the relevant technical details of the detection of structural variants, which can affect the accuracy of detection methods and also we discuss the most important caveats related to the tool evaluation process. This study emphasizes common assumptions, a variety of possible limitations, and valuable insights extracted from the state-of-the-art in CNV (Copy Number Variation) detection tools. Among such points, a frequently mentioned and extremely important is the lack of a gold standard of structural variants, and its impact on the evaluation of existing detection tools. Next, this document describes a biclustering-based methodology to screen a collection of structural variants and provide a set of reliable events, based on a defined equivalence criterion, that is supported by different studies. Finally, we carry out experiments with the proposed methodology using as input data the Database of Genomic Variants (DGV). We found relevant groups of equivalent variants across different studies. In summary, this thesis shows that there is an alternative approach to solving the open problem of the lack of gold standard for evaluating structural variants. |
id |
UFPE_765bdb943b3fe864eec3669a61f2860f |
---|---|
oai_identifier_str |
oai:repositorio.ufpe.br:123456789/25375 |
network_acronym_str |
UFPE |
network_name_str |
Repositório Institucional da UFPE |
repository_id_str |
2221 |
spelling |
NASCIMENTO JÚNIOR, Francisco dohttp://lattes.cnpq.br/6683834339342079http://lattes.cnpq.br/8994178236264483GUIMARÃES, Katia Silva2018-08-03T19:38:31Z2018-08-03T19:38:31Z2017-02-17https://repositorio.ufpe.br/handle/123456789/25375The importance of structural variants as a source of phenotypic variation has grown in recent years. At the same time, the number of tools that detect structural variations using Next- Generation Sequencing (NGS) has increased considerably with the dramatic drop in the cost of sequencing in last ten years. Then evaluating properly the detected structural variants has been featured prominently due to the uncertainty of such alterations, bringing important implications for researchers and clinicians on scrutinizing thoroughly the human genome. These trends have raised interest about careful procedures for assessing the outcomes from variant calling tools. Here, we characterize the relevant technical details of the detection of structural variants, which can affect the accuracy of detection methods and also we discuss the most important caveats related to the tool evaluation process. This study emphasizes common assumptions, a variety of possible limitations, and valuable insights extracted from the state-of-the-art in CNV (Copy Number Variation) detection tools. Among such points, a frequently mentioned and extremely important is the lack of a gold standard of structural variants, and its impact on the evaluation of existing detection tools. Next, this document describes a biclustering-based methodology to screen a collection of structural variants and provide a set of reliable events, based on a defined equivalence criterion, that is supported by different studies. Finally, we carry out experiments with the proposed methodology using as input data the Database of Genomic Variants (DGV). We found relevant groups of equivalent variants across different studies. In summary, this thesis shows that there is an alternative approach to solving the open problem of the lack of gold standard for evaluating structural variants.CAPESA importância das variantes estruturais como fonte de variação fenotípica tem se proliferado nos últimos anos. Ao mesmo tempo, o número de ferramentas que detectam variações estruturais usando Next-Generation Sequencing (NGS) aumentou consideravelmente com a dramática queda no custo de seqüenciamento nos últimos dez anos. Neste cenário, avaliar corretamente as variantes estruturais detectadas tem recebido destaque proeminente devido à incerteza de tais alterações, trazendo implicações importantes para os pesquisadores e clínicos no exame minucioso do genoma humano. Essas tendências têm impulsionado o interesse em procedimentos criteriosos para avaliar os variantes identificados. Inicialmente, caracterizamos os detalhes técnicos relevantes em torno da detecção de variantes estruturais, os quais podem afetar a precisão. Além disso, apresentamos advertências fundamentais relacionadas ao processo de avaliação de uma ferramenta. Desta forma, este estudo enfatiza questões como suposições comuns à maioria das ferramentas, juntamente com limitações e vantagens extraídas do estadoda- arte em ferramentas de detecção de variantes estruturais. Entre esses pontos, há uma muito questão bastante citada que é a falta de um gold standard de variantes estruturais, e como sua ausência impacta na avaliação das ferramentas de detecção existentes. Em seguida, este documento descreve uma metodologia baseada em biclustering para pesquisar uma coleção de variantes estruturais e fornecer um conjunto de eventos confiáveis, com base em um critério de equivalência definido e apoiado por diferentes estudos. Finalmente, realizamos experimentos com essa metodologia usando o Database of Genomic Variants (DGV) como dados de entrada e encontramos grupos relevantes de variantes equivalentes em diferentes estudos. Desta forma, esta tese mostra que existe uma abordagem alternativa para o problema em aberto da falta de gold standard para avaliar variantes estruturais.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessCiência da computaçãoBiologia computacionalScreenVar - a biclustering-based methodology for evaluating structural variantsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisdoutoradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPETHUMBNAILTESE Francisco do Nascimento Junior.pdf.jpgTESE Francisco do Nascimento Junior.pdf.jpgGenerated Thumbnailimage/jpeg1247https://repositorio.ufpe.br/bitstream/123456789/25375/5/TESE%20Francisco%20do%20Nascimento%20Junior.pdf.jpg960786ed718ed5c8c10ccc28ddc507fcMD55ORIGINALTESE Francisco do Nascimento Junior.pdfTESE Francisco do Nascimento Junior.pdfapplication/pdf1104753https://repositorio.ufpe.br/bitstream/123456789/25375/1/TESE%20Francisco%20do%20Nascimento%20Junior.pdf794ee127f9a27d065eb71104d4849c0eMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/25375/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82311https://repositorio.ufpe.br/bitstream/123456789/25375/3/license.txt4b8a02c7f2818eaf00dcf2260dd5eb08MD53TEXTTESE Francisco do Nascimento Junior.pdf.txtTESE Francisco do Nascimento Junior.pdf.txtExtracted texttext/plain267872https://repositorio.ufpe.br/bitstream/123456789/25375/4/TESE%20Francisco%20do%20Nascimento%20Junior.pdf.txtbb447788a3d2b21929aa469002806c76MD54123456789/253752019-10-25 09:06:47.532oai:repositorio.ufpe.br:123456789/25375TGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKClRvZG8gZGVwb3NpdGFudGUgZGUgbWF0ZXJpYWwgbm8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgKFJJKSBkZXZlIGNvbmNlZGVyLCDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIChVRlBFKSwgdW1hIExpY2Vuw6dhIGRlIERpc3RyaWJ1acOnw6NvIE7Do28gRXhjbHVzaXZhIHBhcmEgbWFudGVyIGUgdG9ybmFyIGFjZXNzw612ZWlzIG9zIHNldXMgZG9jdW1lbnRvcywgZW0gZm9ybWF0byBkaWdpdGFsLCBuZXN0ZSByZXBvc2l0w7NyaW8uCgpDb20gYSBjb25jZXNzw6NvIGRlc3RhIGxpY2Vuw6dhIG7Do28gZXhjbHVzaXZhLCBvIGRlcG9zaXRhbnRlIG1hbnTDqW0gdG9kb3Mgb3MgZGlyZWl0b3MgZGUgYXV0b3IuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwoKTGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKCkFvIGNvbmNvcmRhciBjb20gZXN0YSBsaWNlbsOnYSBlIGFjZWl0w6EtbGEsIHZvY8OqIChhdXRvciBvdSBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMpOgoKYSkgRGVjbGFyYSBxdWUgY29uaGVjZSBhIHBvbMOtdGljYSBkZSBjb3B5cmlnaHQgZGEgZWRpdG9yYSBkbyBzZXUgZG9jdW1lbnRvOwpiKSBEZWNsYXJhIHF1ZSBjb25oZWNlIGUgYWNlaXRhIGFzIERpcmV0cml6ZXMgcGFyYSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGUEU7CmMpIENvbmNlZGUgw6AgVUZQRSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZGUgYXJxdWl2YXIsIHJlcHJvZHV6aXIsIGNvbnZlcnRlciAoY29tbyBkZWZpbmlkbyBhIHNlZ3VpciksIGNvbXVuaWNhciBlL291IGRpc3RyaWJ1aXIsIG5vIFJJLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgcG9yIG91dHJvIG1laW87CmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgVUZQRSBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgcGFyYSBxdWFscXVlciBmb3JtYXRvIGRlIGZpY2hlaXJvLCBtZWlvIG91IHN1cG9ydGUsIHBhcmEgZWZlaXRvcyBkZSBzZWd1cmFuw6dhLCBwcmVzZXJ2YcOnw6NvIChiYWNrdXApIGUgYWNlc3NvOwplKSBEZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gw6kgbyBzZXUgdHJhYmFsaG8gb3JpZ2luYWwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2Ugb3MgZGlyZWl0b3MgZGUgb3V0cmEgcGVzc29hIG91IGVudGlkYWRlOwpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlCmF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIGlycmVzdHJpdGEgZG8gcmVzcGVjdGl2byBkZXRlbnRvciBkZXNzZXMgZGlyZWl0b3MgcGFyYSBjZWRlciDDoApVRlBFIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgTGljZW7Dp2EgZSBhdXRvcml6YXIgYSB1bml2ZXJzaWRhZGUgYSB1dGlsaXrDoS1sb3MgbGVnYWxtZW50ZS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zIGRpcmVpdG9zIHPDo28gZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZTsKZykgU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8gcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVRlBFLMKgZGVjbGFyYSBxdWUgY3VtcHJpdSBxdWFpc3F1ZXIgb2JyaWdhw6fDtWVzIGV4aWdpZGFzIHBlbG8gcmVzcGVjdGl2byBjb250cmF0byBvdSBhY29yZG8uCgpBIFVGUEUgaWRlbnRpZmljYXLDoSBjbGFyYW1lbnRlIG8ocykgbm9tZShzKSBkbyhzKSBhdXRvciAoZXMpIGRvcyBkaXJlaXRvcyBkbyBkb2N1bWVudG8gZW50cmVndWUgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBwYXJhIGFsw6ltIGRvIHByZXZpc3RvIG5hIGFsw61uZWEgYykuCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212019-10-25T12:06:47Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
dc.title.pt_BR.fl_str_mv |
ScreenVar - a biclustering-based methodology for evaluating structural variants |
title |
ScreenVar - a biclustering-based methodology for evaluating structural variants |
spellingShingle |
ScreenVar - a biclustering-based methodology for evaluating structural variants NASCIMENTO JÚNIOR, Francisco do Ciência da computação Biologia computacional |
title_short |
ScreenVar - a biclustering-based methodology for evaluating structural variants |
title_full |
ScreenVar - a biclustering-based methodology for evaluating structural variants |
title_fullStr |
ScreenVar - a biclustering-based methodology for evaluating structural variants |
title_full_unstemmed |
ScreenVar - a biclustering-based methodology for evaluating structural variants |
title_sort |
ScreenVar - a biclustering-based methodology for evaluating structural variants |
author |
NASCIMENTO JÚNIOR, Francisco do |
author_facet |
NASCIMENTO JÚNIOR, Francisco do |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/6683834339342079 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/8994178236264483 |
dc.contributor.author.fl_str_mv |
NASCIMENTO JÚNIOR, Francisco do |
dc.contributor.advisor1.fl_str_mv |
GUIMARÃES, Katia Silva |
contributor_str_mv |
GUIMARÃES, Katia Silva |
dc.subject.por.fl_str_mv |
Ciência da computação Biologia computacional |
topic |
Ciência da computação Biologia computacional |
description |
The importance of structural variants as a source of phenotypic variation has grown in recent years. At the same time, the number of tools that detect structural variations using Next- Generation Sequencing (NGS) has increased considerably with the dramatic drop in the cost of sequencing in last ten years. Then evaluating properly the detected structural variants has been featured prominently due to the uncertainty of such alterations, bringing important implications for researchers and clinicians on scrutinizing thoroughly the human genome. These trends have raised interest about careful procedures for assessing the outcomes from variant calling tools. Here, we characterize the relevant technical details of the detection of structural variants, which can affect the accuracy of detection methods and also we discuss the most important caveats related to the tool evaluation process. This study emphasizes common assumptions, a variety of possible limitations, and valuable insights extracted from the state-of-the-art in CNV (Copy Number Variation) detection tools. Among such points, a frequently mentioned and extremely important is the lack of a gold standard of structural variants, and its impact on the evaluation of existing detection tools. Next, this document describes a biclustering-based methodology to screen a collection of structural variants and provide a set of reliable events, based on a defined equivalence criterion, that is supported by different studies. Finally, we carry out experiments with the proposed methodology using as input data the Database of Genomic Variants (DGV). We found relevant groups of equivalent variants across different studies. In summary, this thesis shows that there is an alternative approach to solving the open problem of the lack of gold standard for evaluating structural variants. |
publishDate |
2017 |
dc.date.issued.fl_str_mv |
2017-02-17 |
dc.date.accessioned.fl_str_mv |
2018-08-03T19:38:31Z |
dc.date.available.fl_str_mv |
2018-08-03T19:38:31Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/25375 |
url |
https://repositorio.ufpe.br/handle/123456789/25375 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.publisher.program.fl_str_mv |
Programa de Pos Graduacao em Ciencia da Computacao |
dc.publisher.initials.fl_str_mv |
UFPE |
dc.publisher.country.fl_str_mv |
Brasil |
publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
instname_str |
Universidade Federal de Pernambuco (UFPE) |
instacron_str |
UFPE |
institution |
UFPE |
reponame_str |
Repositório Institucional da UFPE |
collection |
Repositório Institucional da UFPE |
bitstream.url.fl_str_mv |
https://repositorio.ufpe.br/bitstream/123456789/25375/5/TESE%20Francisco%20do%20Nascimento%20Junior.pdf.jpg https://repositorio.ufpe.br/bitstream/123456789/25375/1/TESE%20Francisco%20do%20Nascimento%20Junior.pdf https://repositorio.ufpe.br/bitstream/123456789/25375/2/license_rdf https://repositorio.ufpe.br/bitstream/123456789/25375/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/25375/4/TESE%20Francisco%20do%20Nascimento%20Junior.pdf.txt |
bitstream.checksum.fl_str_mv |
960786ed718ed5c8c10ccc28ddc507fc 794ee127f9a27d065eb71104d4849c0e e39d27027a6cc9cb039ad269a5db8e34 4b8a02c7f2818eaf00dcf2260dd5eb08 bb447788a3d2b21929aa469002806c76 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
repository.mail.fl_str_mv |
attena@ufpe.br |
_version_ |
1802310720102072320 |