Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability

Detalhes bibliográficos
Autor(a) principal: Rocha, Thiago Alves
Data de Publicação: 2019
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da Universidade Federal do Ceará (UFC)
Texto Completo: http://www.repositorio.ufc.br/handle/riufc/51208
Resumo: In this work, we investigate the problem of synthesis of first-order sentences from samples of classified relational structures. In other words, we investigate the following problem: for a fixed class of relational structures, given a sample of classified structures, find a first-order sentence of minimum quantifier rank that is consistent with the sample. We consider the following classes of structures: monadic structures, equivalence structures, disjoint unions of linear orders, and strings represented by finite structures with a successor relation. We use results of the Ehrenfeucht–Fraïssé game on these classes of structures in order to design an algorithm to find such a sentence. For these classes of structures, the problem of determining whether the Duplicator has a winning strategy in an Ehrenfeucht–Fraïssé game is solved in polynomial time. We also introduce the distinguishability sentences, which are sentences that distinguish between two given structures. We define the distinguishability sentences based on necessary and sufficient conditions for a winning strategy in Ehrenfeucht–Fraïssé games. Our algorithm returns a Boolean combination of such sentences. We also show that any first-order sentence is equivalent to a Boolean combination of distinguishability sentences. Finally, we also show that our algorithm’s running time is polynomial in the size of the input. Since general first-order sentences are hard to read, we define a quantifier-free normal form (QNF) over the classes of structures we are considering. QNF sentences are defined over a richer vocabulary such that atomic formulas are an abbreviation of general first-order sentences over a standard vocabulary. Then, QNF sentences consist of Boolean combinations of such atomic sentences over this non-standard vocabulary. Moreover, we define a DNF version for QNF sentences. Then, given a sample of strings and the number of disjunctive clauses, we investigate the problem of finding a DNF formula that is consistent with the sample. We show that this problem is NP-complete and we solve it by a translation into Boolean satisfiability (SAT). We also present an extension of this problem that is robust concerning noisy samples. We solve this generalized version by a codification into the maximum satisfiability problem. Solutions to the problem of finding a QNF sentence in DNF such that the number of clauses is bounded may have a large number of literals per clause. Therefore, we consider a variation of this problem in which the maximum number of literals per clause is also given as input. This is essential since sentences with few clauses and few literals per clause are more compact and easier to interpret. Again, we show that this problem is NP-complete, and our approach for solving it is based on a reduction to the SAT. We also present extensions of these problems that are robust concerning noisy samples. In this case, a sentence may not be consistent with the input sample. We cover two approaches to deal with noisy samples. In the first approach, we consider a problem in which the goal is to find a sentence that classifies the maximum number of strings correctly. We solve this generalized version by a codification into the maximum satisfiability problem (MaxSAT). In our second approach, the goal is to find a sentence such that it does not correctly classify at most a given number of strings. We show thatthis problem concerning a limited number of errors is also NP-complete. Moreover, we give a SAT-based solution to this problem. Among the classes we are considering, strings are more appealing since they may be used to model text data, stress patterns in human languages, biological sequences, and sequences of symbolic data in general. As first-order logic over strings defines exactly the class of locally threshold testable (LTT) languages, our results can be useful in grammatical inference when the goal is to find a model of an LTT language from a sample of strings. In the field of grammatical inference, one of the main problems studied is the task of finding a language model consistent with a given sample of strings.
id UFC-7_aa6be38ac6bb819fbcd6911032d3a511
oai_identifier_str oai:repositorio.ufc.br:riufc/51208
network_acronym_str UFC-7
network_name_str Repositório Institucional da Universidade Federal do Ceará (UFC)
repository_id_str
spelling Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiabilitySynthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiabilityFormula synthesisGrammatical inferenceEhrenfeucht–Fraïssé gameIn this work, we investigate the problem of synthesis of first-order sentences from samples of classified relational structures. In other words, we investigate the following problem: for a fixed class of relational structures, given a sample of classified structures, find a first-order sentence of minimum quantifier rank that is consistent with the sample. We consider the following classes of structures: monadic structures, equivalence structures, disjoint unions of linear orders, and strings represented by finite structures with a successor relation. We use results of the Ehrenfeucht–Fraïssé game on these classes of structures in order to design an algorithm to find such a sentence. For these classes of structures, the problem of determining whether the Duplicator has a winning strategy in an Ehrenfeucht–Fraïssé game is solved in polynomial time. We also introduce the distinguishability sentences, which are sentences that distinguish between two given structures. We define the distinguishability sentences based on necessary and sufficient conditions for a winning strategy in Ehrenfeucht–Fraïssé games. Our algorithm returns a Boolean combination of such sentences. We also show that any first-order sentence is equivalent to a Boolean combination of distinguishability sentences. Finally, we also show that our algorithm’s running time is polynomial in the size of the input. Since general first-order sentences are hard to read, we define a quantifier-free normal form (QNF) over the classes of structures we are considering. QNF sentences are defined over a richer vocabulary such that atomic formulas are an abbreviation of general first-order sentences over a standard vocabulary. Then, QNF sentences consist of Boolean combinations of such atomic sentences over this non-standard vocabulary. Moreover, we define a DNF version for QNF sentences. Then, given a sample of strings and the number of disjunctive clauses, we investigate the problem of finding a DNF formula that is consistent with the sample. We show that this problem is NP-complete and we solve it by a translation into Boolean satisfiability (SAT). We also present an extension of this problem that is robust concerning noisy samples. We solve this generalized version by a codification into the maximum satisfiability problem. Solutions to the problem of finding a QNF sentence in DNF such that the number of clauses is bounded may have a large number of literals per clause. Therefore, we consider a variation of this problem in which the maximum number of literals per clause is also given as input. This is essential since sentences with few clauses and few literals per clause are more compact and easier to interpret. Again, we show that this problem is NP-complete, and our approach for solving it is based on a reduction to the SAT. We also present extensions of these problems that are robust concerning noisy samples. In this case, a sentence may not be consistent with the input sample. We cover two approaches to deal with noisy samples. In the first approach, we consider a problem in which the goal is to find a sentence that classifies the maximum number of strings correctly. We solve this generalized version by a codification into the maximum satisfiability problem (MaxSAT). In our second approach, the goal is to find a sentence such that it does not correctly classify at most a given number of strings. We show thatthis problem concerning a limited number of errors is also NP-complete. Moreover, we give a SAT-based solution to this problem. Among the classes we are considering, strings are more appealing since they may be used to model text data, stress patterns in human languages, biological sequences, and sequences of symbolic data in general. As first-order logic over strings defines exactly the class of locally threshold testable (LTT) languages, our results can be useful in grammatical inference when the goal is to find a model of an LTT language from a sample of strings. In the field of grammatical inference, one of the main problems studied is the task of finding a language model consistent with a given sample of strings.Neste trabalho, nós investigamos o problema da síntese de sentenças de primeira-ordem a partir de amostras de estruturas relacionais classificadas. Em outras palavras, nós consideramos o seguinte problema: para uma classe de estruturas relacionais fixa, dada uma amostra de estruturas classificadas, encontrar uma sentença de primeira-ordem de quantifier rank mínimo que é consistente com a amostra. Nós contemplamos as seguintes classes de estruturas: estruturas monádicas, estruturas de equivalência, uniões disjuntas de ordens lineares e strings representadas por estruturas finitas com uma relação de successor. Nós usamos resultados do jogo Ehrenfeucht–Fraïssé nessas classes de estruturas com a finalidade de projetar um algoritmo para encontrar tal sentença. Para essas classes de estruturas, o problema de determinar se um dos jogadores tem uma estratégia vencedora no jogo Ehrenfeucht–Fraïssé é resolvido em tempo polinomial. Nós também introduzimos sentenças de distinguibilidade que são sentenças que distinguem duas estruturas dadas. Nós definimos as sentenças de distinguibilidade usando condições necessárias e suficientes para uma estratégia vencedora no jogo Ehrenfeucht–Fraïssé. Nosso algoritmo retorna uma combinação Booleana de tais sentenças. Nós também mostramos que qualquer sentença de primeira-ordem é equivalente à uma combinação Booleana de sentenças de distinguibilidade. Finalmente, nós também mostramos que o tempo de execução do nosso algoritmo é polinomial no tamanho da amostra de entrada. Como, em geral, sentenças de primeira-ordem são dificeis de ler, nós definimos uma formal normal livre de quantificadores (QNF – do inglês: quantifier-free normal form) para as classes de estruturas que estamos considerando. Sentenças QNF são definidas com respeito a um vocabulário mais rico tal que sentenças atômicas são abreviações de sentenças arbitrárias de primeira-ordem sobre o vocabulário padrão. Dessa forma, sentenças QNF consistem de combinações Booleanas de tais sentenças atômicas sobre esse vocabulário não-padrão. Além disso, nós definimos uma forma normal disjuntiva (DNF – do inglês: disjunctive normal form) para sentenças QNF. Portanto, dada uma amostra de strings classificadas e o número de cláusulas disjuntivas, nós investigamos o problema de encontrar uma sentença QNF na DNF que é consistente com a amostra. Nós provamos que esse problema é NP-completo e mostramos uma solução baseada em uma codificação para o problema da satisfatibilidade Booleana (SAT). Soluções para o problema de encontrar uma sentença QNF na DNF tal que apenas o número de cláusulas é limitado podem ter um número grande de literais por cláusula. Portanto, nós consideramos uma variação desse problema no qual o número máximo de literais por cláusula também é fornecido como entrada. Isso é essencial pois sentenças com poucas cláusulas e poucos literais por cláusula são mais compactas e mais fáceis de interpretar. Novamente, nós mostramos que esse problema é NP-completo e nossa abordagem para resolvê-lo é baseada na redução para o SAT. Nós também apresentamos extensões desses problemas que são robustas com respeito a amostras ruidosas. Nesse caso, uma sentença pode não ser consistente com a amostra da entrada. Nós cobrimos duas abordagens para lidar com amostras com ruídos. Na primeira abordagem, nós consideramos o problema no qual o objetivo é encontraruma sentença que classifica corretamente o maior número de strings. Nós resolvemos essa versão generalizada através de uma codificação no problema da satisfatibilidade máxima (MaxSAT). Na nossa segunda abordagem, o objetivo é encontrar uma sentença que não classifica corretamente no máximo uma quantidade dada de strings. Nós mostramos que esse problema relativo a um número limitado de erros também é NP-completo. Além disso, nós apresentamos uma solução baseada no SAT para resolver esse problema. Dentre as classes que estamos considerando, as strings são mais interessantes pois elas podem ser usadas para modelar dados textuais, padrões de tonicidade em línguas humanas, sequências biológicas e sequências de dados simbólicos em geral. Como a lógica de primeira-ordem sobre strings define exatamente a classe das linguagens locally threshold testable (LTT), nossos resultados podem ser úteis em inferência gramatical quando o propósito é encontrar uma descrição formal de uma linguagem LTT a partir de uma amostra de strings. Na área de inferência gramatical, um dos principais problemas estudados é a tarefa de obter um modelo de linguagem consistente com uma amostra de strings classificadas.Martins, Ana Teresa de CastroFerreira, Francicleber MartinsRocha, Thiago Alves2020-04-08T16:11:57Z2020-04-08T16:11:57Z2019info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfROCHA, Thiago Alves. Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability. 2019. 136 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2019.http://www.repositorio.ufc.br/handle/riufc/51208engreponame:Repositório Institucional da Universidade Federal do Ceará (UFC)instname:Universidade Federal do Ceará (UFC)instacron:UFCinfo:eu-repo/semantics/openAccess2020-04-08T16:11:57Zoai:repositorio.ufc.br:riufc/51208Repositório InstitucionalPUBhttp://www.repositorio.ufc.br/ri-oai/requestbu@ufc.br || repositorio@ufc.bropendoar:2024-09-11T19:03:56.558752Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)false
dc.title.none.fl_str_mv Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability
Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability
title Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability
spellingShingle Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability
Rocha, Thiago Alves
Formula synthesis
Grammatical inference
Ehrenfeucht–Fraïssé game
title_short Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability
title_full Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability
title_fullStr Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability
title_full_unstemmed Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability
title_sort Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability
author Rocha, Thiago Alves
author_facet Rocha, Thiago Alves
author_role author
dc.contributor.none.fl_str_mv Martins, Ana Teresa de Castro
Ferreira, Francicleber Martins
dc.contributor.author.fl_str_mv Rocha, Thiago Alves
dc.subject.por.fl_str_mv Formula synthesis
Grammatical inference
Ehrenfeucht–Fraïssé game
topic Formula synthesis
Grammatical inference
Ehrenfeucht–Fraïssé game
description In this work, we investigate the problem of synthesis of first-order sentences from samples of classified relational structures. In other words, we investigate the following problem: for a fixed class of relational structures, given a sample of classified structures, find a first-order sentence of minimum quantifier rank that is consistent with the sample. We consider the following classes of structures: monadic structures, equivalence structures, disjoint unions of linear orders, and strings represented by finite structures with a successor relation. We use results of the Ehrenfeucht–Fraïssé game on these classes of structures in order to design an algorithm to find such a sentence. For these classes of structures, the problem of determining whether the Duplicator has a winning strategy in an Ehrenfeucht–Fraïssé game is solved in polynomial time. We also introduce the distinguishability sentences, which are sentences that distinguish between two given structures. We define the distinguishability sentences based on necessary and sufficient conditions for a winning strategy in Ehrenfeucht–Fraïssé games. Our algorithm returns a Boolean combination of such sentences. We also show that any first-order sentence is equivalent to a Boolean combination of distinguishability sentences. Finally, we also show that our algorithm’s running time is polynomial in the size of the input. Since general first-order sentences are hard to read, we define a quantifier-free normal form (QNF) over the classes of structures we are considering. QNF sentences are defined over a richer vocabulary such that atomic formulas are an abbreviation of general first-order sentences over a standard vocabulary. Then, QNF sentences consist of Boolean combinations of such atomic sentences over this non-standard vocabulary. Moreover, we define a DNF version for QNF sentences. Then, given a sample of strings and the number of disjunctive clauses, we investigate the problem of finding a DNF formula that is consistent with the sample. We show that this problem is NP-complete and we solve it by a translation into Boolean satisfiability (SAT). We also present an extension of this problem that is robust concerning noisy samples. We solve this generalized version by a codification into the maximum satisfiability problem. Solutions to the problem of finding a QNF sentence in DNF such that the number of clauses is bounded may have a large number of literals per clause. Therefore, we consider a variation of this problem in which the maximum number of literals per clause is also given as input. This is essential since sentences with few clauses and few literals per clause are more compact and easier to interpret. Again, we show that this problem is NP-complete, and our approach for solving it is based on a reduction to the SAT. We also present extensions of these problems that are robust concerning noisy samples. In this case, a sentence may not be consistent with the input sample. We cover two approaches to deal with noisy samples. In the first approach, we consider a problem in which the goal is to find a sentence that classifies the maximum number of strings correctly. We solve this generalized version by a codification into the maximum satisfiability problem (MaxSAT). In our second approach, the goal is to find a sentence such that it does not correctly classify at most a given number of strings. We show thatthis problem concerning a limited number of errors is also NP-complete. Moreover, we give a SAT-based solution to this problem. Among the classes we are considering, strings are more appealing since they may be used to model text data, stress patterns in human languages, biological sequences, and sequences of symbolic data in general. As first-order logic over strings defines exactly the class of locally threshold testable (LTT) languages, our results can be useful in grammatical inference when the goal is to find a model of an LTT language from a sample of strings. In the field of grammatical inference, one of the main problems studied is the task of finding a language model consistent with a given sample of strings.
publishDate 2019
dc.date.none.fl_str_mv 2019
2020-04-08T16:11:57Z
2020-04-08T16:11:57Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv ROCHA, Thiago Alves. Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability. 2019. 136 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2019.
http://www.repositorio.ufc.br/handle/riufc/51208
identifier_str_mv ROCHA, Thiago Alves. Synthesis of first-order sentences using Ehrenfeucht–Fraïssé games and Boolean satisfiability. 2019. 136 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Ceará, Fortaleza, 2019.
url http://www.repositorio.ufc.br/handle/riufc/51208
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Ceará (UFC)
instname:Universidade Federal do Ceará (UFC)
instacron:UFC
instname_str Universidade Federal do Ceará (UFC)
instacron_str UFC
institution UFC
reponame_str Repositório Institucional da Universidade Federal do Ceará (UFC)
collection Repositório Institucional da Universidade Federal do Ceará (UFC)
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Ceará (UFC) - Universidade Federal do Ceará (UFC)
repository.mail.fl_str_mv bu@ufc.br || repositorio@ufc.br
_version_ 1813029052662939648