Recognition of online handwritten mathematical expressions using contextual information

Aguilar, Frank Dennis Julca

Recognition of online handwritten mathematical expressions using contextual information

Detalhes bibliográficos
Autor(a) principal:	Aguilar, Frank Dennis Julca
Data de Publicação:	2016
Tipo de documento:	Tese
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25072016-164800/
Resumo:	Online handwritten mathematical expressions consist of sequences of strokes, usually collected through a touch screen device. Automatic recognition of online handwritten mathematical expressions requires solving three subproblems: symbol segmentation, symbol classification, and structural analysis (that is, the identification of spatial relations, as subscript or superscript, between symbols). A main issue in the recognition process is ambiguity at symbol or relation levels that often leads to several likely interpretations of an expression. Some methods treat the recognition problem as a pipeline process, in which symbol segmentation and classification is followed by structural analysis. A main drawback of such methods is that they compute symbol level interpretations without considering structural information, which is essential to solve ambiguities. To cope with this drawback, more recent methods adapt string parsing techniques to drive the recognition process. As string grammars were originally designed to model linear arrangements of objects (like in text, where symbols are arranged only through left-to-right relations), non-linear arrangements of mathematical symbols (given by the multiple relation types of mathematics) are modeled as compositions of production rules for linear structures. Then, parsing an expression involves searching for linear structures in the expression that are consistent with the structure of the production rules. This last step requires the introduction of constraints or assumptions, such as stroke input order or vertical and horizontal alignments, to linearize the expression components. These requirements not only limit the effectiveness of the methods, but also make difficult their extension to include new expression structures. In this thesis, we model the recognition problem as a graph parsing problem. The graph-based description of relations in the production rules allows direct modeling of non-linear mathematical structures. Our parsing algorithm determines recursive partitions of the input strokes that induce graphs matching the production rule graphs. To mitigate the computational cost, we constrain the possible partitions to graphs derived from sets of symbol and relation hypotheses, calculated using previously trained classifiers. A set of labels that indicate likely interpretations is associated to each symbol and relation hypothesis, and treatment of ambiguity at symbol and relation levels is left to the parsing process. The parsing algorithm builds a forest in which each tree corresponds to an interpretation coherent with the grammar. We define a score function, optimized through training data, that associates a cost to each tree. We then select a tree with minimum cost as result. Experimental evaluation shows that the proposed method is more accurate than several state of the art methods. Even though graph parsing is a computationally expensive process, the use of symbol and relation hypotheses to constrain the search space is able to effectively reduce complexity, allowing practical application of the process. Furthermore, since the proposed parsing algorithm does not make direct use of structural particularities of mathematical expressions, it has potential to be adapted for other two-dimensional object recognition problems. As a secondary contribution of this thesis, we have proposed a framework to automatize the process of building handwritten mathematical expression datasets. The framework has been implemented in a computer system and used to generate part of the samples used in the experimental part of this thesis.

Metadados do item

id	USP_e20ab1a7b3ee41e7745d9e3e5b1b47ed
oai_identifier_str	oai:teses.usp.br:tde-25072016-164800
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	Recognition of online handwritten mathematical expressions using contextual informationReconhecimento online de expressões matemáticas manuscritas usando informação contextualContextual informationGraph parsingInformação contextualMathematical expression recognitionParsing de grafosReconhecimento de expressões matemáticasOnline handwritten mathematical expressions consist of sequences of strokes, usually collected through a touch screen device. Automatic recognition of online handwritten mathematical expressions requires solving three subproblems: symbol segmentation, symbol classification, and structural analysis (that is, the identification of spatial relations, as subscript or superscript, between symbols). A main issue in the recognition process is ambiguity at symbol or relation levels that often leads to several likely interpretations of an expression. Some methods treat the recognition problem as a pipeline process, in which symbol segmentation and classification is followed by structural analysis. A main drawback of such methods is that they compute symbol level interpretations without considering structural information, which is essential to solve ambiguities. To cope with this drawback, more recent methods adapt string parsing techniques to drive the recognition process. As string grammars were originally designed to model linear arrangements of objects (like in text, where symbols are arranged only through left-to-right relations), non-linear arrangements of mathematical symbols (given by the multiple relation types of mathematics) are modeled as compositions of production rules for linear structures. Then, parsing an expression involves searching for linear structures in the expression that are consistent with the structure of the production rules. This last step requires the introduction of constraints or assumptions, such as stroke input order or vertical and horizontal alignments, to linearize the expression components. These requirements not only limit the effectiveness of the methods, but also make difficult their extension to include new expression structures. In this thesis, we model the recognition problem as a graph parsing problem. The graph-based description of relations in the production rules allows direct modeling of non-linear mathematical structures. Our parsing algorithm determines recursive partitions of the input strokes that induce graphs matching the production rule graphs. To mitigate the computational cost, we constrain the possible partitions to graphs derived from sets of symbol and relation hypotheses, calculated using previously trained classifiers. A set of labels that indicate likely interpretations is associated to each symbol and relation hypothesis, and treatment of ambiguity at symbol and relation levels is left to the parsing process. The parsing algorithm builds a forest in which each tree corresponds to an interpretation coherent with the grammar. We define a score function, optimized through training data, that associates a cost to each tree. We then select a tree with minimum cost as result. Experimental evaluation shows that the proposed method is more accurate than several state of the art methods. Even though graph parsing is a computationally expensive process, the use of symbol and relation hypotheses to constrain the search space is able to effectively reduce complexity, allowing practical application of the process. Furthermore, since the proposed parsing algorithm does not make direct use of structural particularities of mathematical expressions, it has potential to be adapted for other two-dimensional object recognition problems. As a secondary contribution of this thesis, we have proposed a framework to automatize the process of building handwritten mathematical expression datasets. The framework has been implemented in a computer system and used to generate part of the samples used in the experimental part of this thesis.Expressões matemáticas manuscritas online estão constituídas por sequências de traços. O reconhecimento automático de tais expressões requer a solução de três subproblemas: segmentação de símbolos, classificação de símbolos e análise estrutural (isto é, a identificação de relações espaciais, tais como sobrescrito e subscrito, entre símbolos). Uma das dificuldades principais do problema é a ambiguidade no nível de símbolos ou relações, que frequentemente sugere várias possíveis interpretações de uma mesma expressão. Alguns métodos de reconhecimento tratam o problema de maneira sequencial, onde um processo de segmentação e classificação de símbolos é seguido de análise estrutural. Um problema principal de tais métodos é que eles determinam interpretações no nível de símbolos sem considerar informação estrutural, a qual é importante para solucionar ambiguidades. Para solucionar esse problema, métodos mais recentes adaptaram técnicas de parsing de strings. Dado que gramáticas de strings foram originalmente projetadas para modelar arranjos lineares de tokens (como texto, onde símbolos são arranjados de esquerda a direita), a estrutura não linear dos símbolos matemáticos (dada pelos multiples tipos de relações espaciais) é modelada como uma composição de regras de produção de estruturas lineares. Dessa maneira, o parsing de uma expressão consiste em determinar estruturas lineares na expressão que são consistentes com as estruturas das regras de produção. Esse último passo requer a introdução de restrições, baseadas na definição de uma ordem em relação ao tempo ou espaço, para linearizar os componentes da expresão. Os requerimentos das gramáticas de strings não apenas limitam a efectividade dos métodos, mas também dificultam a extensão dos métodos na inclusão de novas estruturas. Neste trabalho, o problema de reconhecimento de expressões matemáticas é modelado como um problema de parsing de grafos. A representação por meio de grafos nas regras de produção permite uma representação direta das estruturas não lineares das expressões matemáticas. O algoritmo de parsing determina partições dos traços de entrada que induzem grafos isomorfos aos grafos das regras de produção. Para mitigar o custo computacional, restringimos as possíveis partições a aquelas derivadas de um conjunto de possíveis símbolos e relações identificados por classificadores previamente treinados. Um conjunto de rótulos que indica interpretações alternativas é associado a cada símbolo e relação; a decisão da melhor interpretação é realizada pelo parser. O parser construi uma floresta na qual uma árvore representa uma possível interpretação da entrada, e atribui um custo de interpretação para cada árvore, baseado nas relações e símbolos definidas na árvore. O resultado do reconhecimento é dado pela extração de uma árvore com custo mínimo. Resultados experimentais do método proposto mostram um melhor desempenho em comparação com vários métodos descritos na literatura. A pesar do parsing de grafos ser um processo computacionalmente caro, a restrição do espaço de busca proposto reduz a complexidade o suficiente para permitir uma aplicação prática da abordagem. Adicionalmente, dado que a abordagem não pressupõe estruturas particulares das expressões matemática, o método tem potencial para ser adaptado para o reconhecimento de outras estruturas bidimensionais. Uma contribuição secundaria deste trabalho é o desenvolvimento de uma framework para construção automática de bancos de dados de expressões matemáticas manuscritas. A framework tem sido implementada num sistema usado para criar parte das amostras de expressões usadas para avaliação do método de reconhecimento.Biblioteca Digitais de Teses e Dissertações da USPHirata, Nina Sumiko TomitaAguilar, Frank Dennis Julca2016-04-29info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/45/45134/tde-25072016-164800/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2017-09-04T21:05:35Zoai:teses.usp.br:tde-25072016-164800Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212017-09-04T21:05:35Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Recognition of online handwritten mathematical expressions using contextual information Reconhecimento online de expressões matemáticas manuscritas usando informação contextual
title	Recognition of online handwritten mathematical expressions using contextual information
spellingShingle	Recognition of online handwritten mathematical expressions using contextual information Aguilar, Frank Dennis Julca Contextual information Graph parsing Informação contextual Mathematical expression recognition Parsing de grafos Reconhecimento de expressões matemáticas
title_short	Recognition of online handwritten mathematical expressions using contextual information
title_full	Recognition of online handwritten mathematical expressions using contextual information
title_fullStr	Recognition of online handwritten mathematical expressions using contextual information
title_full_unstemmed	Recognition of online handwritten mathematical expressions using contextual information
title_sort	Recognition of online handwritten mathematical expressions using contextual information
author	Aguilar, Frank Dennis Julca
author_facet	Aguilar, Frank Dennis Julca
author_role	author
dc.contributor.none.fl_str_mv	Hirata, Nina Sumiko Tomita
dc.contributor.author.fl_str_mv	Aguilar, Frank Dennis Julca
dc.subject.por.fl_str_mv	Contextual information Graph parsing Informação contextual Mathematical expression recognition Parsing de grafos Reconhecimento de expressões matemáticas
topic	Contextual information Graph parsing Informação contextual Mathematical expression recognition Parsing de grafos Reconhecimento de expressões matemáticas
description	Online handwritten mathematical expressions consist of sequences of strokes, usually collected through a touch screen device. Automatic recognition of online handwritten mathematical expressions requires solving three subproblems: symbol segmentation, symbol classification, and structural analysis (that is, the identification of spatial relations, as subscript or superscript, between symbols). A main issue in the recognition process is ambiguity at symbol or relation levels that often leads to several likely interpretations of an expression. Some methods treat the recognition problem as a pipeline process, in which symbol segmentation and classification is followed by structural analysis. A main drawback of such methods is that they compute symbol level interpretations without considering structural information, which is essential to solve ambiguities. To cope with this drawback, more recent methods adapt string parsing techniques to drive the recognition process. As string grammars were originally designed to model linear arrangements of objects (like in text, where symbols are arranged only through left-to-right relations), non-linear arrangements of mathematical symbols (given by the multiple relation types of mathematics) are modeled as compositions of production rules for linear structures. Then, parsing an expression involves searching for linear structures in the expression that are consistent with the structure of the production rules. This last step requires the introduction of constraints or assumptions, such as stroke input order or vertical and horizontal alignments, to linearize the expression components. These requirements not only limit the effectiveness of the methods, but also make difficult their extension to include new expression structures. In this thesis, we model the recognition problem as a graph parsing problem. The graph-based description of relations in the production rules allows direct modeling of non-linear mathematical structures. Our parsing algorithm determines recursive partitions of the input strokes that induce graphs matching the production rule graphs. To mitigate the computational cost, we constrain the possible partitions to graphs derived from sets of symbol and relation hypotheses, calculated using previously trained classifiers. A set of labels that indicate likely interpretations is associated to each symbol and relation hypothesis, and treatment of ambiguity at symbol and relation levels is left to the parsing process. The parsing algorithm builds a forest in which each tree corresponds to an interpretation coherent with the grammar. We define a score function, optimized through training data, that associates a cost to each tree. We then select a tree with minimum cost as result. Experimental evaluation shows that the proposed method is more accurate than several state of the art methods. Even though graph parsing is a computationally expensive process, the use of symbol and relation hypotheses to constrain the search space is able to effectively reduce complexity, allowing practical application of the process. Furthermore, since the proposed parsing algorithm does not make direct use of structural particularities of mathematical expressions, it has potential to be adapted for other two-dimensional object recognition problems. As a secondary contribution of this thesis, we have proposed a framework to automatize the process of building handwritten mathematical expression datasets. The framework has been implemented in a computer system and used to generate part of the samples used in the experimental part of this thesis.
publishDate	2016
dc.date.none.fl_str_mv	2016-04-29
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25072016-164800/
url	http://www.teses.usp.br/teses/disponiveis/45/45134/tde-25072016-164800/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1815257334574743552

Recognition of online handwritten mathematical expressions using contextual information

Registros relacionados