Reactive methodologies to infinite text processing

Detalhes bibliográficos
Autor(a) principal: João Saffran de Rezende
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFMG
Texto Completo: http://hdl.handle.net/1843/58695
Resumo: A string event is the occurrence of a specific pattern in the textual output of a program. The capture and treatment of string events has several applications, such as log anonymization, error handling and user notification, implementing web crawler and performing code refactoring. However, there is no systematic approach to identify and treat string events today. This work formally defines string events and brings forward the theory and practice of a general framework to handle them. We demonstrate the effectiveness of this framework by presenting two implementations that use it. First we introduce ZheFuscator, a system that redacts occurrences of sensitive information in database logs. ZheFuscator is implemented as an extension to the Java Virtual Machine (JVM). It intercepts patterns of interest on-the-fly and does not require interventions in the source code of the protected program. It can infer log formats and capture string events with minimal performance overhead. As an illustration, it is up to 14x faster than an equivalent brute-force approach, converging to a definitive grammar after observing less than 10 examples from typical logs. Second we introduce a general notation to the handling of infinite text processing. This notation highlights commonalities in tasks that, although in principle different, encode the same essential challenges. We have concretized this notation into ZheLang, a reactive language that lets users combine basic operations to identify and treat string events. As a proof of concept, we demonstrate how ZheLang operators can be combined to implement applications as disparate as log obfuscators and search engines.
id UFMG_bd8b6d0125c8b9135a2b299172318451
oai_identifier_str oai:repositorio.ufmg.br:1843/58695
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Haniel Moreira Barbosahttp://lattes.cnpq.br/6657126741011519Fernando Magno Quintão PereiraMário Sérgio Ferreira Alvim JúniorRodrigo Geraldo Ribeirohttp://lattes.cnpq.br/0824212408102657João Saffran de Rezende2023-09-14T19:49:04Z2023-09-14T19:49:04Z2021-03-23http://hdl.handle.net/1843/58695A string event is the occurrence of a specific pattern in the textual output of a program. The capture and treatment of string events has several applications, such as log anonymization, error handling and user notification, implementing web crawler and performing code refactoring. However, there is no systematic approach to identify and treat string events today. This work formally defines string events and brings forward the theory and practice of a general framework to handle them. We demonstrate the effectiveness of this framework by presenting two implementations that use it. First we introduce ZheFuscator, a system that redacts occurrences of sensitive information in database logs. ZheFuscator is implemented as an extension to the Java Virtual Machine (JVM). It intercepts patterns of interest on-the-fly and does not require interventions in the source code of the protected program. It can infer log formats and capture string events with minimal performance overhead. As an illustration, it is up to 14x faster than an equivalent brute-force approach, converging to a definitive grammar after observing less than 10 examples from typical logs. Second we introduce a general notation to the handling of infinite text processing. This notation highlights commonalities in tasks that, although in principle different, encode the same essential challenges. We have concretized this notation into ZheLang, a reactive language that lets users combine basic operations to identify and treat string events. As a proof of concept, we demonstrate how ZheLang operators can be combined to implement applications as disparate as log obfuscators and search engines.Um evento de string é a ocorrência de um padrão específico na saída textual de um programa. A captura e tratamento de eventos de string tem várias aplicações, como anonimização de logs, tratamento de erros e notificação de usuário, implementação de web crawlers e refatoração de código. No entanto, não há hoje uma abordagem sistemática para identificar e tratar eventos de string. Este trabalho define formalmente eventos de string e apresenta a teoria e prática de um framework para tratá-los. Demonstramos a eficácia deste framework propondo duas implementações. Primeiro, apresentamos ZheFuscator, um sistema que edita ocorrências de informações confidenciais em logs de banco de dados. ZheFuscator é implementado como uma extensão da Java Virtual Machine (JVM). Ele intercepta padrões de interesse em tempo real e não requer intervenções no código-fonte do programa a ser protegido. Demonstramos que o ZheFuscator é até 14x mais rápido do que uma abordagem força bruta, convergindo para uma gramática que descreve o formato do log de um banco de dados mysql depois de observar menos de 10 exemplos deste logs. Demonstramos também que este processo de inferir formatos de log e capturar eventos de string pode ser implementado com mínimo overhead. Em segundo lugar, apresentamos uma notação geral para o tratamento de texto infinito. Essa notação destaca semelhanças em tarefas que, embora em princípio diferentes, codificam os mesmos desafios essenciais. Nós combinamos essa notação propondo ZheLang, uma linguagem reativa que permite os usuários combinarem operações básicas para identificar e tratar eventos de string. Como prova de conceito, demonstramos como os operadores de ZheLang podem ser combinados para implementar aplicativos como: ofuscadores de log e máquinas de busca.engUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em Ciência da ComputaçãoUFMGBrasilICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃOhttp://creativecommons.org/licenses/by-nc-nd/3.0/pt/info:eu-repo/semantics/openAccessComputação – TesesLinguagem de programação (Computadores) – TesesProgramação reativa – TesesAnálise (Gramática de computador) – TesesComputaçãoLinguagem de programação (Computadores)Programação reativaAnálise (Gramática de computador)Reactive methodologies to infinite text processinginfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufmg.br/bitstream/1843/58695/9/license_rdfcfd6801dba008cb6adbd9838b81582abMD59ORIGINALDisertação.pdfDisertação.pdfapplication/pdf2881866https://repositorio.ufmg.br/bitstream/1843/58695/11/Diserta%c3%a7%c3%a3o.pdfb67c25454199fdc464dfc59c1bc6df54MD511LICENSElicense.txtlicense.txttext/plain; charset=utf-82118https://repositorio.ufmg.br/bitstream/1843/58695/12/license.txtcda590c95a0b51b4d15f60c9642ca272MD5121843/586952023-09-14 16:49:04.921oai:repositorio.ufmg.br:1843/58695TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2023-09-14T19:49:04Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv Reactive methodologies to infinite text processing
title Reactive methodologies to infinite text processing
spellingShingle Reactive methodologies to infinite text processing
João Saffran de Rezende
Computação
Linguagem de programação (Computadores)
Programação reativa
Análise (Gramática de computador)
Computação – Teses
Linguagem de programação (Computadores) – Teses
Programação reativa – Teses
Análise (Gramática de computador) – Teses
title_short Reactive methodologies to infinite text processing
title_full Reactive methodologies to infinite text processing
title_fullStr Reactive methodologies to infinite text processing
title_full_unstemmed Reactive methodologies to infinite text processing
title_sort Reactive methodologies to infinite text processing
author João Saffran de Rezende
author_facet João Saffran de Rezende
author_role author
dc.contributor.advisor1.fl_str_mv Haniel Moreira Barbosa
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/6657126741011519
dc.contributor.advisor-co1.fl_str_mv Fernando Magno Quintão Pereira
dc.contributor.referee1.fl_str_mv Mário Sérgio Ferreira Alvim Júnior
dc.contributor.referee2.fl_str_mv Rodrigo Geraldo Ribeiro
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/0824212408102657
dc.contributor.author.fl_str_mv João Saffran de Rezende
contributor_str_mv Haniel Moreira Barbosa
Fernando Magno Quintão Pereira
Mário Sérgio Ferreira Alvim Júnior
Rodrigo Geraldo Ribeiro
dc.subject.por.fl_str_mv Computação
Linguagem de programação (Computadores)
Programação reativa
Análise (Gramática de computador)
topic Computação
Linguagem de programação (Computadores)
Programação reativa
Análise (Gramática de computador)
Computação – Teses
Linguagem de programação (Computadores) – Teses
Programação reativa – Teses
Análise (Gramática de computador) – Teses
dc.subject.other.pt_BR.fl_str_mv Computação – Teses
Linguagem de programação (Computadores) – Teses
Programação reativa – Teses
Análise (Gramática de computador) – Teses
description A string event is the occurrence of a specific pattern in the textual output of a program. The capture and treatment of string events has several applications, such as log anonymization, error handling and user notification, implementing web crawler and performing code refactoring. However, there is no systematic approach to identify and treat string events today. This work formally defines string events and brings forward the theory and practice of a general framework to handle them. We demonstrate the effectiveness of this framework by presenting two implementations that use it. First we introduce ZheFuscator, a system that redacts occurrences of sensitive information in database logs. ZheFuscator is implemented as an extension to the Java Virtual Machine (JVM). It intercepts patterns of interest on-the-fly and does not require interventions in the source code of the protected program. It can infer log formats and capture string events with minimal performance overhead. As an illustration, it is up to 14x faster than an equivalent brute-force approach, converging to a definitive grammar after observing less than 10 examples from typical logs. Second we introduce a general notation to the handling of infinite text processing. This notation highlights commonalities in tasks that, although in principle different, encode the same essential challenges. We have concretized this notation into ZheLang, a reactive language that lets users combine basic operations to identify and treat string events. As a proof of concept, we demonstrate how ZheLang operators can be combined to implement applications as disparate as log obfuscators and search engines.
publishDate 2021
dc.date.issued.fl_str_mv 2021-03-23
dc.date.accessioned.fl_str_mv 2023-09-14T19:49:04Z
dc.date.available.fl_str_mv 2023-09-14T19:49:04Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/58695
url http://hdl.handle.net/1843/58695
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/pt/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
dc.publisher.department.fl_str_mv ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/58695/9/license_rdf
https://repositorio.ufmg.br/bitstream/1843/58695/11/Diserta%c3%a7%c3%a3o.pdf
https://repositorio.ufmg.br/bitstream/1843/58695/12/license.txt
bitstream.checksum.fl_str_mv cfd6801dba008cb6adbd9838b81582ab
b67c25454199fdc464dfc59c1bc6df54
cda590c95a0b51b4d15f60c9642ca272
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1797971067197718528