Practical dynamic reconstruction of control flow graphs

Detalhes bibliográficos
Autor(a) principal: Andrei Rimsa Álvares
Data de Publicação: 2020
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da UFMG
Texto Completo: http://hdl.handle.net/1843/36523
https://orcid.org/0000-0002-0151-2900
Resumo: The automatic recovery of a program’s high-level representation from its binary version is a well-studied problem in programming languages. However, most of the solutions to this problem are based on purely static approaches: techniques such as dataflow analyses or type inference are used to convert the bytes that constitute the executable code back into a control flow graph (CFG). This work departs from such a modus operandi to show that a dynamic analysis can be effective and useful, both as a standalone technique, and as a way to enhance the precision of static approaches. The experimental results provide evidence that completeness, i.e., the ability to conclude that the entire CFG has been discovered, is achievable on many functions that are part of industry-strong benchmarks. Experiments also indicate that dynamic information greatly enhances the ability of DynInst, a state-of-the-art binary reconstructor, to deal with code stripped of debugging information. These results were obtained with CFGgrind, a new implementation of a dynamic code reconstructor, built on top of valgrind. When applied to cBench, CFGgrind is 9% faster than callgrind, valgrind’s tool used to track targets of function calls; and 7% faster in Spec Cpu2017. CFGgrind recovers the complete CFG of 40% of all the procedures invoked during the standard execution of programs in Spec Cpu2017, and 37% in cBench. When combined with CFGgrind, DynInst finds 15% more CFGs for cBench, and 7% more CFGs for Spec Cpu2017. Finally, CFGgrind is more than 7 times faster than DCFG, a CFG reconstructor from Intel, and 1.28 times faster than bfTrace, a CFG reconstructor used in research. CFGgrind is also more precise than these two tools, handling operating system signals, shared code in functions, and unaligned instructions; besides supporting multi-threaded programs, exact profiling and incremental refinements.
id UFMG_679a94718a721f7b6ec0cd03af1afc0f
oai_identifier_str oai:repositorio.ufmg.br:1843/36523
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Fernando Magno Quintão Pereirahttp://lattes.cnpq.br/4608001746330875José Nelson Amaralhttp://lattes.cnpq.br/9725605913159774José Eduardo MoreiraRodolfo Jardim de AzevedoLeonardo Barbosa e OliveiraMarcos Augusto Menezes Vieirahttp://lattes.cnpq.br/0531001850708530Andrei Rimsa Álvares2021-06-19T20:18:12Z2021-06-19T20:18:12Z2020-11-05http://hdl.handle.net/1843/36523https://orcid.org/0000-0002-0151-2900The automatic recovery of a program’s high-level representation from its binary version is a well-studied problem in programming languages. However, most of the solutions to this problem are based on purely static approaches: techniques such as dataflow analyses or type inference are used to convert the bytes that constitute the executable code back into a control flow graph (CFG). This work departs from such a modus operandi to show that a dynamic analysis can be effective and useful, both as a standalone technique, and as a way to enhance the precision of static approaches. The experimental results provide evidence that completeness, i.e., the ability to conclude that the entire CFG has been discovered, is achievable on many functions that are part of industry-strong benchmarks. Experiments also indicate that dynamic information greatly enhances the ability of DynInst, a state-of-the-art binary reconstructor, to deal with code stripped of debugging information. These results were obtained with CFGgrind, a new implementation of a dynamic code reconstructor, built on top of valgrind. When applied to cBench, CFGgrind is 9% faster than callgrind, valgrind’s tool used to track targets of function calls; and 7% faster in Spec Cpu2017. CFGgrind recovers the complete CFG of 40% of all the procedures invoked during the standard execution of programs in Spec Cpu2017, and 37% in cBench. When combined with CFGgrind, DynInst finds 15% more CFGs for cBench, and 7% more CFGs for Spec Cpu2017. Finally, CFGgrind is more than 7 times faster than DCFG, a CFG reconstructor from Intel, and 1.28 times faster than bfTrace, a CFG reconstructor used in research. CFGgrind is also more precise than these two tools, handling operating system signals, shared code in functions, and unaligned instructions; besides supporting multi-threaded programs, exact profiling and incremental refinements.A recuperação automática de informações de alto-nível de programas em formato binário é um importante problema estudado em linguagens de programação. Contudo, a maioria das soluções para esse problema são baseadas puramente em abordagens estáticas: técnicas como análise de fluxo de dados ou inferência de tipos são utilizadas para converter os bytes que constituem o executável de volta para o formato de um grafo de fluxo de controle (GFC). Esse trabalho se afasta desse tal modus operandi para mostrar que análises dinâmicas podem ser efetivas e úteis, tanto como uma técnica independente, quanto como uma forma de melhorar a precisão das abordagens estáticas. Os resultados experimentais mostram evidências que completude, ou seja, a habilidade de concluir que todos os caminhos de um GFC foram cobertos, é alcançada em muitas funções de benchmarks de nível industrial. Os experimentos também indicam que informações coletadas dinamicamente melhoram consideravelmente a habilidade de DynInst, um reconstrutor estático estado-da-arte, de lidar com códigos binários sem símbolos de depuração. Esses resultados foram obtidos com CFGgrind, um reconstrutor dinâmico de códigos binários, construído sobre a infraestrutura de valgrind. Quando aplicado sobre cBench, CFGgrind é 9% mais rápido que callgrind, uma ferramenta de valgrind capaz de rastrear alvos de chamadas de funções; e 7% mais rápido em Spec Cpu2017. CFGgrind recupera GFCs completos em 40% de todos os procedimentos invocados durante a execução padrão de programas em Spec Cpu2017, e 37% em cBench. Quando combinado com CFGgrind, DynInst encontra 15% mais GFCs para cBench e 7% mais GFCs para Spec Cpu2017. Finalmente, CFGgrind é 7 vezes mais rápido que DCFG, um reconstrutor de GFC desenvolvido pela Intel, e é 1.28 vezes mais rápido que bfTrace, um reconstrutor usado em pesquisa. CFGgrind é também mais preciso que essas duas ferramentas. Ele suporta tratamento de sinais de sistema operacional, códigos compartilhados em funções, instruções desalinhadas, programas multi-thread, profiling exato e refinamentos incrementais.engUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em Ciência da ComputaçãoUFMGBrasilICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃOhttp://creativecommons.org/licenses/by/3.0/pt/info:eu-repo/semantics/openAccessComputação – TesesCompiladores (Programas de computador) - TesesLinguagem de programação (Computadores) – TesesFluxo de dados (Computação) – TesesControl flow graphDynamic analysisCode instrumentationPractical dynamic reconstruction of control flow graphsReconstrução dinâmica prática de grafos de fluxo de controleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALthesis.pdfthesis.pdfapplication/pdf4521084https://repositorio.ufmg.br/bitstream/1843/36523/8/thesis.pdf8201bd8adf1736c85e15009d0b30e370MD58LICENSElicense.txtlicense.txttext/plain; charset=utf-82118https://repositorio.ufmg.br/bitstream/1843/36523/9/license.txtcda590c95a0b51b4d15f60c9642ca272MD59CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8914https://repositorio.ufmg.br/bitstream/1843/36523/2/license_rdff9944a358a0c32770bd9bed185bb5395MD521843/365232021-06-19 17:18:13.007oai:repositorio.ufmg.br:1843/36523TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2021-06-19T20:18:13Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv Practical dynamic reconstruction of control flow graphs
dc.title.alternative.pt_BR.fl_str_mv Reconstrução dinâmica prática de grafos de fluxo de controle
title Practical dynamic reconstruction of control flow graphs
spellingShingle Practical dynamic reconstruction of control flow graphs
Andrei Rimsa Álvares
Control flow graph
Dynamic analysis
Code instrumentation
Computação – Teses
Compiladores (Programas de computador) - Teses
Linguagem de programação (Computadores) – Teses
Fluxo de dados (Computação) – Teses
title_short Practical dynamic reconstruction of control flow graphs
title_full Practical dynamic reconstruction of control flow graphs
title_fullStr Practical dynamic reconstruction of control flow graphs
title_full_unstemmed Practical dynamic reconstruction of control flow graphs
title_sort Practical dynamic reconstruction of control flow graphs
author Andrei Rimsa Álvares
author_facet Andrei Rimsa Álvares
author_role author
dc.contributor.advisor1.fl_str_mv Fernando Magno Quintão Pereira
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/4608001746330875
dc.contributor.advisor2.fl_str_mv José Nelson Amaral
dc.contributor.advisor2Lattes.fl_str_mv http://lattes.cnpq.br/9725605913159774
dc.contributor.referee1.fl_str_mv José Eduardo Moreira
dc.contributor.referee2.fl_str_mv Rodolfo Jardim de Azevedo
dc.contributor.referee3.fl_str_mv Leonardo Barbosa e Oliveira
dc.contributor.referee4.fl_str_mv Marcos Augusto Menezes Vieira
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/0531001850708530
dc.contributor.author.fl_str_mv Andrei Rimsa Álvares
contributor_str_mv Fernando Magno Quintão Pereira
José Nelson Amaral
José Eduardo Moreira
Rodolfo Jardim de Azevedo
Leonardo Barbosa e Oliveira
Marcos Augusto Menezes Vieira
dc.subject.por.fl_str_mv Control flow graph
Dynamic analysis
Code instrumentation
topic Control flow graph
Dynamic analysis
Code instrumentation
Computação – Teses
Compiladores (Programas de computador) - Teses
Linguagem de programação (Computadores) – Teses
Fluxo de dados (Computação) – Teses
dc.subject.other.pt_BR.fl_str_mv Computação – Teses
Compiladores (Programas de computador) - Teses
Linguagem de programação (Computadores) – Teses
Fluxo de dados (Computação) – Teses
description The automatic recovery of a program’s high-level representation from its binary version is a well-studied problem in programming languages. However, most of the solutions to this problem are based on purely static approaches: techniques such as dataflow analyses or type inference are used to convert the bytes that constitute the executable code back into a control flow graph (CFG). This work departs from such a modus operandi to show that a dynamic analysis can be effective and useful, both as a standalone technique, and as a way to enhance the precision of static approaches. The experimental results provide evidence that completeness, i.e., the ability to conclude that the entire CFG has been discovered, is achievable on many functions that are part of industry-strong benchmarks. Experiments also indicate that dynamic information greatly enhances the ability of DynInst, a state-of-the-art binary reconstructor, to deal with code stripped of debugging information. These results were obtained with CFGgrind, a new implementation of a dynamic code reconstructor, built on top of valgrind. When applied to cBench, CFGgrind is 9% faster than callgrind, valgrind’s tool used to track targets of function calls; and 7% faster in Spec Cpu2017. CFGgrind recovers the complete CFG of 40% of all the procedures invoked during the standard execution of programs in Spec Cpu2017, and 37% in cBench. When combined with CFGgrind, DynInst finds 15% more CFGs for cBench, and 7% more CFGs for Spec Cpu2017. Finally, CFGgrind is more than 7 times faster than DCFG, a CFG reconstructor from Intel, and 1.28 times faster than bfTrace, a CFG reconstructor used in research. CFGgrind is also more precise than these two tools, handling operating system signals, shared code in functions, and unaligned instructions; besides supporting multi-threaded programs, exact profiling and incremental refinements.
publishDate 2020
dc.date.issued.fl_str_mv 2020-11-05
dc.date.accessioned.fl_str_mv 2021-06-19T20:18:12Z
dc.date.available.fl_str_mv 2021-06-19T20:18:12Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/36523
dc.identifier.orcid.pt_BR.fl_str_mv https://orcid.org/0000-0002-0151-2900
url http://hdl.handle.net/1843/36523
https://orcid.org/0000-0002-0151-2900
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by/3.0/pt/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/3.0/pt/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/36523/8/thesis.pdf
https://repositorio.ufmg.br/bitstream/1843/36523/9/license.txt
https://repositorio.ufmg.br/bitstream/1843/36523/2/license_rdf
bitstream.checksum.fl_str_mv 8201bd8adf1736c85e15009d0b30e370
cda590c95a0b51b4d15f60c9642ca272
f9944a358a0c32770bd9bed185bb5395
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1797971149880033280