Practical dynamic reconstruction of control flow graphs
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFMG |
Texto Completo: | http://hdl.handle.net/1843/36523 https://orcid.org/0000-0002-0151-2900 |
Resumo: | The automatic recovery of a program’s high-level representation from its binary version is a well-studied problem in programming languages. However, most of the solutions to this problem are based on purely static approaches: techniques such as dataflow analyses or type inference are used to convert the bytes that constitute the executable code back into a control flow graph (CFG). This work departs from such a modus operandi to show that a dynamic analysis can be effective and useful, both as a standalone technique, and as a way to enhance the precision of static approaches. The experimental results provide evidence that completeness, i.e., the ability to conclude that the entire CFG has been discovered, is achievable on many functions that are part of industry-strong benchmarks. Experiments also indicate that dynamic information greatly enhances the ability of DynInst, a state-of-the-art binary reconstructor, to deal with code stripped of debugging information. These results were obtained with CFGgrind, a new implementation of a dynamic code reconstructor, built on top of valgrind. When applied to cBench, CFGgrind is 9% faster than callgrind, valgrind’s tool used to track targets of function calls; and 7% faster in Spec Cpu2017. CFGgrind recovers the complete CFG of 40% of all the procedures invoked during the standard execution of programs in Spec Cpu2017, and 37% in cBench. When combined with CFGgrind, DynInst finds 15% more CFGs for cBench, and 7% more CFGs for Spec Cpu2017. Finally, CFGgrind is more than 7 times faster than DCFG, a CFG reconstructor from Intel, and 1.28 times faster than bfTrace, a CFG reconstructor used in research. CFGgrind is also more precise than these two tools, handling operating system signals, shared code in functions, and unaligned instructions; besides supporting multi-threaded programs, exact profiling and incremental refinements. |
id |
UFMG_679a94718a721f7b6ec0cd03af1afc0f |
---|---|
oai_identifier_str |
oai:repositorio.ufmg.br:1843/36523 |
network_acronym_str |
UFMG |
network_name_str |
Repositório Institucional da UFMG |
repository_id_str |
|
spelling |
Fernando Magno Quintão Pereirahttp://lattes.cnpq.br/4608001746330875José Nelson Amaralhttp://lattes.cnpq.br/9725605913159774José Eduardo MoreiraRodolfo Jardim de AzevedoLeonardo Barbosa e OliveiraMarcos Augusto Menezes Vieirahttp://lattes.cnpq.br/0531001850708530Andrei Rimsa Álvares2021-06-19T20:18:12Z2021-06-19T20:18:12Z2020-11-05http://hdl.handle.net/1843/36523https://orcid.org/0000-0002-0151-2900The automatic recovery of a program’s high-level representation from its binary version is a well-studied problem in programming languages. However, most of the solutions to this problem are based on purely static approaches: techniques such as dataflow analyses or type inference are used to convert the bytes that constitute the executable code back into a control flow graph (CFG). This work departs from such a modus operandi to show that a dynamic analysis can be effective and useful, both as a standalone technique, and as a way to enhance the precision of static approaches. The experimental results provide evidence that completeness, i.e., the ability to conclude that the entire CFG has been discovered, is achievable on many functions that are part of industry-strong benchmarks. Experiments also indicate that dynamic information greatly enhances the ability of DynInst, a state-of-the-art binary reconstructor, to deal with code stripped of debugging information. These results were obtained with CFGgrind, a new implementation of a dynamic code reconstructor, built on top of valgrind. When applied to cBench, CFGgrind is 9% faster than callgrind, valgrind’s tool used to track targets of function calls; and 7% faster in Spec Cpu2017. CFGgrind recovers the complete CFG of 40% of all the procedures invoked during the standard execution of programs in Spec Cpu2017, and 37% in cBench. When combined with CFGgrind, DynInst finds 15% more CFGs for cBench, and 7% more CFGs for Spec Cpu2017. Finally, CFGgrind is more than 7 times faster than DCFG, a CFG reconstructor from Intel, and 1.28 times faster than bfTrace, a CFG reconstructor used in research. CFGgrind is also more precise than these two tools, handling operating system signals, shared code in functions, and unaligned instructions; besides supporting multi-threaded programs, exact profiling and incremental refinements.A recuperação automática de informações de alto-nível de programas em formato binário é um importante problema estudado em linguagens de programação. Contudo, a maioria das soluções para esse problema são baseadas puramente em abordagens estáticas: técnicas como análise de fluxo de dados ou inferência de tipos são utilizadas para converter os bytes que constituem o executável de volta para o formato de um grafo de fluxo de controle (GFC). Esse trabalho se afasta desse tal modus operandi para mostrar que análises dinâmicas podem ser efetivas e úteis, tanto como uma técnica independente, quanto como uma forma de melhorar a precisão das abordagens estáticas. Os resultados experimentais mostram evidências que completude, ou seja, a habilidade de concluir que todos os caminhos de um GFC foram cobertos, é alcançada em muitas funções de benchmarks de nível industrial. Os experimentos também indicam que informações coletadas dinamicamente melhoram consideravelmente a habilidade de DynInst, um reconstrutor estático estado-da-arte, de lidar com códigos binários sem símbolos de depuração. Esses resultados foram obtidos com CFGgrind, um reconstrutor dinâmico de códigos binários, construído sobre a infraestrutura de valgrind. Quando aplicado sobre cBench, CFGgrind é 9% mais rápido que callgrind, uma ferramenta de valgrind capaz de rastrear alvos de chamadas de funções; e 7% mais rápido em Spec Cpu2017. CFGgrind recupera GFCs completos em 40% de todos os procedimentos invocados durante a execução padrão de programas em Spec Cpu2017, e 37% em cBench. Quando combinado com CFGgrind, DynInst encontra 15% mais GFCs para cBench e 7% mais GFCs para Spec Cpu2017. Finalmente, CFGgrind é 7 vezes mais rápido que DCFG, um reconstrutor de GFC desenvolvido pela Intel, e é 1.28 vezes mais rápido que bfTrace, um reconstrutor usado em pesquisa. CFGgrind é também mais preciso que essas duas ferramentas. Ele suporta tratamento de sinais de sistema operacional, códigos compartilhados em funções, instruções desalinhadas, programas multi-thread, profiling exato e refinamentos incrementais.engUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em Ciência da ComputaçãoUFMGBrasilICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃOhttp://creativecommons.org/licenses/by/3.0/pt/info:eu-repo/semantics/openAccessComputação – TesesCompiladores (Programas de computador) - TesesLinguagem de programação (Computadores) – TesesFluxo de dados (Computação) – TesesControl flow graphDynamic analysisCode instrumentationPractical dynamic reconstruction of control flow graphsReconstrução dinâmica prática de grafos de fluxo de controleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALthesis.pdfthesis.pdfapplication/pdf4521084https://repositorio.ufmg.br/bitstream/1843/36523/8/thesis.pdf8201bd8adf1736c85e15009d0b30e370MD58LICENSElicense.txtlicense.txttext/plain; charset=utf-82118https://repositorio.ufmg.br/bitstream/1843/36523/9/license.txtcda590c95a0b51b4d15f60c9642ca272MD59CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8914https://repositorio.ufmg.br/bitstream/1843/36523/2/license_rdff9944a358a0c32770bd9bed185bb5395MD521843/365232021-06-19 17:18:13.007oai:repositorio.ufmg.br:1843/36523TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2021-06-19T20:18:13Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
dc.title.pt_BR.fl_str_mv |
Practical dynamic reconstruction of control flow graphs |
dc.title.alternative.pt_BR.fl_str_mv |
Reconstrução dinâmica prática de grafos de fluxo de controle |
title |
Practical dynamic reconstruction of control flow graphs |
spellingShingle |
Practical dynamic reconstruction of control flow graphs Andrei Rimsa Álvares Control flow graph Dynamic analysis Code instrumentation Computação – Teses Compiladores (Programas de computador) - Teses Linguagem de programação (Computadores) – Teses Fluxo de dados (Computação) – Teses |
title_short |
Practical dynamic reconstruction of control flow graphs |
title_full |
Practical dynamic reconstruction of control flow graphs |
title_fullStr |
Practical dynamic reconstruction of control flow graphs |
title_full_unstemmed |
Practical dynamic reconstruction of control flow graphs |
title_sort |
Practical dynamic reconstruction of control flow graphs |
author |
Andrei Rimsa Álvares |
author_facet |
Andrei Rimsa Álvares |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
Fernando Magno Quintão Pereira |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/4608001746330875 |
dc.contributor.advisor2.fl_str_mv |
José Nelson Amaral |
dc.contributor.advisor2Lattes.fl_str_mv |
http://lattes.cnpq.br/9725605913159774 |
dc.contributor.referee1.fl_str_mv |
José Eduardo Moreira |
dc.contributor.referee2.fl_str_mv |
Rodolfo Jardim de Azevedo |
dc.contributor.referee3.fl_str_mv |
Leonardo Barbosa e Oliveira |
dc.contributor.referee4.fl_str_mv |
Marcos Augusto Menezes Vieira |
dc.contributor.authorLattes.fl_str_mv |
http://lattes.cnpq.br/0531001850708530 |
dc.contributor.author.fl_str_mv |
Andrei Rimsa Álvares |
contributor_str_mv |
Fernando Magno Quintão Pereira José Nelson Amaral José Eduardo Moreira Rodolfo Jardim de Azevedo Leonardo Barbosa e Oliveira Marcos Augusto Menezes Vieira |
dc.subject.por.fl_str_mv |
Control flow graph Dynamic analysis Code instrumentation |
topic |
Control flow graph Dynamic analysis Code instrumentation Computação – Teses Compiladores (Programas de computador) - Teses Linguagem de programação (Computadores) – Teses Fluxo de dados (Computação) – Teses |
dc.subject.other.pt_BR.fl_str_mv |
Computação – Teses Compiladores (Programas de computador) - Teses Linguagem de programação (Computadores) – Teses Fluxo de dados (Computação) – Teses |
description |
The automatic recovery of a program’s high-level representation from its binary version is a well-studied problem in programming languages. However, most of the solutions to this problem are based on purely static approaches: techniques such as dataflow analyses or type inference are used to convert the bytes that constitute the executable code back into a control flow graph (CFG). This work departs from such a modus operandi to show that a dynamic analysis can be effective and useful, both as a standalone technique, and as a way to enhance the precision of static approaches. The experimental results provide evidence that completeness, i.e., the ability to conclude that the entire CFG has been discovered, is achievable on many functions that are part of industry-strong benchmarks. Experiments also indicate that dynamic information greatly enhances the ability of DynInst, a state-of-the-art binary reconstructor, to deal with code stripped of debugging information. These results were obtained with CFGgrind, a new implementation of a dynamic code reconstructor, built on top of valgrind. When applied to cBench, CFGgrind is 9% faster than callgrind, valgrind’s tool used to track targets of function calls; and 7% faster in Spec Cpu2017. CFGgrind recovers the complete CFG of 40% of all the procedures invoked during the standard execution of programs in Spec Cpu2017, and 37% in cBench. When combined with CFGgrind, DynInst finds 15% more CFGs for cBench, and 7% more CFGs for Spec Cpu2017. Finally, CFGgrind is more than 7 times faster than DCFG, a CFG reconstructor from Intel, and 1.28 times faster than bfTrace, a CFG reconstructor used in research. CFGgrind is also more precise than these two tools, handling operating system signals, shared code in functions, and unaligned instructions; besides supporting multi-threaded programs, exact profiling and incremental refinements. |
publishDate |
2020 |
dc.date.issued.fl_str_mv |
2020-11-05 |
dc.date.accessioned.fl_str_mv |
2021-06-19T20:18:12Z |
dc.date.available.fl_str_mv |
2021-06-19T20:18:12Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1843/36523 |
dc.identifier.orcid.pt_BR.fl_str_mv |
https://orcid.org/0000-0002-0151-2900 |
url |
http://hdl.handle.net/1843/36523 https://orcid.org/0000-0002-0151-2900 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by/3.0/pt/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by/3.0/pt/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Ciência da Computação |
dc.publisher.initials.fl_str_mv |
UFMG |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO |
publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
instname_str |
Universidade Federal de Minas Gerais (UFMG) |
instacron_str |
UFMG |
institution |
UFMG |
reponame_str |
Repositório Institucional da UFMG |
collection |
Repositório Institucional da UFMG |
bitstream.url.fl_str_mv |
https://repositorio.ufmg.br/bitstream/1843/36523/8/thesis.pdf https://repositorio.ufmg.br/bitstream/1843/36523/9/license.txt https://repositorio.ufmg.br/bitstream/1843/36523/2/license_rdf |
bitstream.checksum.fl_str_mv |
8201bd8adf1736c85e15009d0b30e370 cda590c95a0b51b4d15f60c9642ca272 f9944a358a0c32770bd9bed185bb5395 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
repository.mail.fl_str_mv |
|
_version_ |
1797971149880033280 |