Recognition of online handwritten mathematical expressions using contextual information

Detalhes bibliográficos
Autor(a) principal: Aguilar, Frank Dennis Julca
Data de Publicação: 2016
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25072016-164800/
Resumo: Online handwritten mathematical expressions consist of sequences of strokes, usually collected through a touch screen device. Automatic recognition of online handwritten mathematical expressions requires solving three subproblems: symbol segmentation, symbol classification, and structural analysis (that is, the identification of spatial relations, as subscript or superscript, between symbols). A main issue in the recognition process is ambiguity at symbol or relation levels that often leads to several likely interpretations of an expression. Some methods treat the recognition problem as a pipeline process, in which symbol segmentation and classification is followed by structural analysis. A main drawback of such methods is that they compute symbol level interpretations without considering structural information, which is essential to solve ambiguities. To cope with this drawback, more recent methods adapt string parsing techniques to drive the recognition process. As string grammars were originally designed to model linear arrangements of objects (like in text, where symbols are arranged only through left-to-right relations), non-linear arrangements of mathematical symbols (given by the multiple relation types of mathematics) are modeled as compositions of production rules for linear structures. Then, parsing an expression involves searching for linear structures in the expression that are consistent with the structure of the production rules. This last step requires the introduction of constraints or assumptions, such as stroke input order or vertical and horizontal alignments, to linearize the expression components. These requirements not only limit the effectiveness of the methods, but also make difficult their extension to include new expression structures. In this thesis, we model the recognition problem as a graph parsing problem. The graph-based description of relations in the production rules allows direct modeling of non-linear mathematical structures. Our parsing algorithm determines recursive partitions of the input strokes that induce graphs matching the production rule graphs. To mitigate the computational cost, we constrain the possible partitions to graphs derived from sets of symbol and relation hypotheses, calculated using previously trained classifiers. A set of labels that indicate likely interpretations is associated to each symbol and relation hypothesis, and treatment of ambiguity at symbol and relation levels is left to the parsing process. The parsing algorithm builds a forest in which each tree corresponds to an interpretation coherent with the grammar. We define a score function, optimized through training data, that associates a cost to each tree. We then select a tree with minimum cost as result. Experimental evaluation shows that the proposed method is more accurate than several state of the art methods. Even though graph parsing is a computationally expensive process, the use of symbol and relation hypotheses to constrain the search space is able to effectively reduce complexity, allowing practical application of the process. Furthermore, since the proposed parsing algorithm does not make direct use of structural particularities of mathematical expressions, it has potential to be adapted for other two-dimensional object recognition problems. As a secondary contribution of this thesis, we have proposed a framework to automatize the process of building handwritten mathematical expression datasets. The framework has been implemented in a computer system and used to generate part of the samples used in the experimental part of this thesis.
id USP_e20ab1a7b3ee41e7745d9e3e5b1b47ed
oai_identifier_str oai:teses.usp.br:tde-25072016-164800
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Recognition of online handwritten mathematical expressions using contextual informationReconhecimento online de expressões matemáticas manuscritas usando informação contextualContextual informationGraph parsingInformação contextualMathematical expression recognitionParsing de grafosReconhecimento de expressões matemáticasOnline handwritten mathematical expressions consist of sequences of strokes, usually collected through a touch screen device. Automatic recognition of online handwritten mathematical expressions requires solving three subproblems: symbol segmentation, symbol classification, and structural analysis (that is, the identification of spatial relations, as subscript or superscript, between symbols). A main issue in the recognition process is ambiguity at symbol or relation levels that often leads to several likely interpretations of an expression. Some methods treat the recognition problem as a pipeline process, in which symbol segmentation and classification is followed by structural analysis. A main drawback of such methods is that they compute symbol level interpretations without considering structural information, which is essential to solve ambiguities. To cope with this drawback, more recent methods adapt string parsing techniques to drive the recognition process. As string grammars were originally designed to model linear arrangements of objects (like in text, where symbols are arranged only through left-to-right relations), non-linear arrangements of mathematical symbols (given by the multiple relation types of mathematics) are modeled as compositions of production rules for linear structures. Then, parsing an expression involves searching for linear structures in the expression that are consistent with the structure of the production rules. This last step requires the introduction of constraints or assumptions, such as stroke input order or vertical and horizontal alignments, to linearize the expression components. These requirements not only limit the effectiveness of the methods, but also make difficult their extension to include new expression structures. In this thesis, we model the recognition problem as a graph parsing problem. The graph-based description of relations in the production rules allows direct modeling of non-linear mathematical structures. Our parsing algorithm determines recursive partitions of the input strokes that induce graphs matching the production rule graphs. To mitigate the computational cost, we constrain the possible partitions to graphs derived from sets of symbol and relation hypotheses, calculated using previously trained classifiers. A set of labels that indicate likely interpretations is associated to each symbol and relation hypothesis, and treatment of ambiguity at symbol and relation levels is left to the parsing process. The parsing algorithm builds a forest in which each tree corresponds to an interpretation coherent with the grammar. We define a score function, optimized through training data, that associates a cost to each tree. We then select a tree with minimum cost as result. Experimental evaluation shows that the proposed method is more accurate than several state of the art methods. Even though graph parsing is a computationally expensive process, the use of symbol and relation hypotheses to constrain the search space is able to effectively reduce complexity, allowing practical application of the process. Furthermore, since the proposed parsing algorithm does not make direct use of structural particularities of mathematical expressions, it has potential to be adapted for other two-dimensional object recognition problems. As a secondary contribution of this thesis, we have proposed a framework to automatize the process of building handwritten mathematical expression datasets. The framework has been implemented in a computer system and used to generate part of the samples used in the experimental part of this thesis.Expressões matemáticas manuscritas online estão constituídas por sequências de traços. O reconhecimento automático de tais expressões requer a solução de três subproblemas: segmentação de símbolos, classificação de símbolos e análise estrutural (isto é, a identificação de relações espaciais, tais como sobrescrito e subscrito, entre símbolos). Uma das dificuldades principais do problema é a ambiguidade no nível de símbolos ou relações, que frequentemente sugere várias possíveis interpretações de uma mesma expressão. Alguns métodos de reconhecimento tratam o problema de maneira sequencial, onde um processo de segmentação e classificação de símbolos é seguido de análise estrutural. Um problema principal de tais métodos é que eles determinam interpretações no nível de símbolos sem considerar informação estrutural, a qual é importante para solucionar ambiguidades. Para solucionar esse problema, métodos mais recentes adaptaram técnicas de parsing de strings. Dado que gramáticas de strings foram originalmente projetadas para modelar arranjos lineares de tokens (como texto, onde símbolos são arranjados de esquerda a direita), a estrutura não linear dos símbolos matemáticos (dada pelos multiples tipos de relações espaciais) é modelada como uma composição de regras de produção de estruturas lineares. Dessa maneira, o parsing de uma expressão consiste em determinar estruturas lineares na expressão que são consistentes com as estruturas das regras de produção. Esse último passo requer a introdução de restrições, baseadas na definição de uma ordem em relação ao tempo ou espaço, para linearizar os componentes da expresão. Os requerimentos das gramáticas de strings não apenas limitam a efectividade dos métodos, mas também dificultam a extensão dos métodos na inclusão de novas estruturas. Neste trabalho, o problema de reconhecimento de expressões matemáticas é modelado como um problema de parsing de grafos. A representação por meio de grafos nas regras de produção permite uma representação direta das estruturas não lineares das expressões matemáticas. O algoritmo de parsing determina partições dos traços de entrada que induzem grafos isomorfos aos grafos das regras de produção. Para mitigar o custo computacional, restringimos as possíveis partições a aquelas derivadas de um conjunto de possíveis símbolos e relações identificados por classificadores previamente treinados. Um conjunto de rótulos que indica interpretações alternativas é associado a cada símbolo e relação; a decisão da melhor interpretação é realizada pelo parser. O parser construi uma floresta na qual uma árvore representa uma possível interpretação da entrada, e atribui um custo de interpretação para cada árvore, baseado nas relações e símbolos definidas na árvore. O resultado do reconhecimento é dado pela extração de uma árvore com custo mínimo. Resultados experimentais do método proposto mostram um melhor desempenho em comparação com vários métodos descritos na literatura. A pesar do parsing de grafos ser um processo computacionalmente caro, a restrição do espaço de busca proposto reduz a complexidade o suficiente para permitir uma aplicação prática da abordagem. Adicionalmente, dado que a abordagem não pressupõe estruturas particulares das expressões matemática, o método tem potencial para ser adaptado para o reconhecimento de outras estruturas bidimensionais. Uma contribuição secundaria deste trabalho é o desenvolvimento de uma framework para construção automática de bancos de dados de expressões matemáticas manuscritas. A framework tem sido implementada num sistema usado para criar parte das amostras de expressões usadas para avaliação do método de reconhecimento.Biblioteca Digitais de Teses e Dissertações da USPHirata, Nina Sumiko TomitaAguilar, Frank Dennis Julca2016-04-29info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/45/45134/tde-25072016-164800/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2017-09-04T21:05:35Zoai:teses.usp.br:tde-25072016-164800Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212017-09-04T21:05:35Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Recognition of online handwritten mathematical expressions using contextual information
Reconhecimento online de expressões matemáticas manuscritas usando informação contextual
title Recognition of online handwritten mathematical expressions using contextual information
spellingShingle Recognition of online handwritten mathematical expressions using contextual information
Aguilar, Frank Dennis Julca
Contextual information
Graph parsing
Informação contextual
Mathematical expression recognition
Parsing de grafos
Reconhecimento de expressões matemáticas
title_short Recognition of online handwritten mathematical expressions using contextual information
title_full Recognition of online handwritten mathematical expressions using contextual information
title_fullStr Recognition of online handwritten mathematical expressions using contextual information
title_full_unstemmed Recognition of online handwritten mathematical expressions using contextual information
title_sort Recognition of online handwritten mathematical expressions using contextual information
author Aguilar, Frank Dennis Julca
author_facet Aguilar, Frank Dennis Julca
author_role author
dc.contributor.none.fl_str_mv Hirata, Nina Sumiko Tomita
dc.contributor.author.fl_str_mv Aguilar, Frank Dennis Julca
dc.subject.por.fl_str_mv Contextual information
Graph parsing
Informação contextual
Mathematical expression recognition
Parsing de grafos
Reconhecimento de expressões matemáticas
topic Contextual information
Graph parsing
Informação contextual
Mathematical expression recognition
Parsing de grafos
Reconhecimento de expressões matemáticas
description Online handwritten mathematical expressions consist of sequences of strokes, usually collected through a touch screen device. Automatic recognition of online handwritten mathematical expressions requires solving three subproblems: symbol segmentation, symbol classification, and structural analysis (that is, the identification of spatial relations, as subscript or superscript, between symbols). A main issue in the recognition process is ambiguity at symbol or relation levels that often leads to several likely interpretations of an expression. Some methods treat the recognition problem as a pipeline process, in which symbol segmentation and classification is followed by structural analysis. A main drawback of such methods is that they compute symbol level interpretations without considering structural information, which is essential to solve ambiguities. To cope with this drawback, more recent methods adapt string parsing techniques to drive the recognition process. As string grammars were originally designed to model linear arrangements of objects (like in text, where symbols are arranged only through left-to-right relations), non-linear arrangements of mathematical symbols (given by the multiple relation types of mathematics) are modeled as compositions of production rules for linear structures. Then, parsing an expression involves searching for linear structures in the expression that are consistent with the structure of the production rules. This last step requires the introduction of constraints or assumptions, such as stroke input order or vertical and horizontal alignments, to linearize the expression components. These requirements not only limit the effectiveness of the methods, but also make difficult their extension to include new expression structures. In this thesis, we model the recognition problem as a graph parsing problem. The graph-based description of relations in the production rules allows direct modeling of non-linear mathematical structures. Our parsing algorithm determines recursive partitions of the input strokes that induce graphs matching the production rule graphs. To mitigate the computational cost, we constrain the possible partitions to graphs derived from sets of symbol and relation hypotheses, calculated using previously trained classifiers. A set of labels that indicate likely interpretations is associated to each symbol and relation hypothesis, and treatment of ambiguity at symbol and relation levels is left to the parsing process. The parsing algorithm builds a forest in which each tree corresponds to an interpretation coherent with the grammar. We define a score function, optimized through training data, that associates a cost to each tree. We then select a tree with minimum cost as result. Experimental evaluation shows that the proposed method is more accurate than several state of the art methods. Even though graph parsing is a computationally expensive process, the use of symbol and relation hypotheses to constrain the search space is able to effectively reduce complexity, allowing practical application of the process. Furthermore, since the proposed parsing algorithm does not make direct use of structural particularities of mathematical expressions, it has potential to be adapted for other two-dimensional object recognition problems. As a secondary contribution of this thesis, we have proposed a framework to automatize the process of building handwritten mathematical expression datasets. The framework has been implemented in a computer system and used to generate part of the samples used in the experimental part of this thesis.
publishDate 2016
dc.date.none.fl_str_mv 2016-04-29
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25072016-164800/
url http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25072016-164800/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815257334574743552