Deep neural semantic parsing: translating from natural language into SPARQL

Luz, Fabiano Ferreira

Deep neural semantic parsing: translating from natural language into SPARQL

Detalhes bibliográficos
Autor(a) principal:	Luz, Fabiano Ferreira
Data de Publicação:	2019
Tipo de documento:	Tese
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	http://www.teses.usp.br/teses/disponiveis/45/45134/tde-01042019-101602/
Resumo:	Semantic parsing is the process of mapping a natural-language sentence into a machine-readable, formal representation of its meaning. The LSTM Encoder-Decoder is a neural architecture with the ability to map a source language into a target one. We are interested in the problem of mapping natural language into SPARQL queries, and we seek to contribute with strategies that do not rely on handcrafted rules, high-quality lexicons, manually-built templates or other handmade complex structures. In this context, we present two contributions to the problem of semantic parsing departing from the LSTM encoder-decoder. While natural language has well defined vector representation methods that use a very large volume of texts, formal languages, like SPARQL queries, suffer from lack of suitable methods for vector representation. In the first contribution we improve the representation of SPARQL vectors. We start by obtaining an alignment matrix between the two vocabularies, natural language and SPARQL terms, which allows us to refine a vectorial representation of SPARQL items. With this refinement we obtained better results in the posterior training for the semantic parsing model. In the second contribution we propose a neural architecture, that we call Encoder CFG-Decoder, whose output conforms to a given context-free grammar. Unlike the traditional LSTM encoder-decoder, our model provides a grammatical guarantee for the mapping process, which is particularly important for practical cases where grammatical errors can cause critical failures. Results confirm that any output generated by our model obeys the given CFG, and we observe a translation accuracy improvement when compared with other results from the literature.

Metadados do item

id	USP_a8da7d3fca6e615afc102e265c5e4cf1
oai_identifier_str	oai:teses.usp.br:tde-01042019-101602
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	Deep neural semantic parsing: translating from natural language into SPARQLAnálise semântica neural profunda: traduzindo de linguagem natural para SPARQLAnálise semânticaCFGCodificação decodificaçãoEncoder decoderGLCGramáticasGrammarsLSTMLSTMNLPOntologiasOntologyPalavras associadasPLNRDFRDFRNNRNNSemantic parsingSPARQLSPARQLWord embeddingsSemantic parsing is the process of mapping a natural-language sentence into a machine-readable, formal representation of its meaning. The LSTM Encoder-Decoder is a neural architecture with the ability to map a source language into a target one. We are interested in the problem of mapping natural language into SPARQL queries, and we seek to contribute with strategies that do not rely on handcrafted rules, high-quality lexicons, manually-built templates or other handmade complex structures. In this context, we present two contributions to the problem of semantic parsing departing from the LSTM encoder-decoder. While natural language has well defined vector representation methods that use a very large volume of texts, formal languages, like SPARQL queries, suffer from lack of suitable methods for vector representation. In the first contribution we improve the representation of SPARQL vectors. We start by obtaining an alignment matrix between the two vocabularies, natural language and SPARQL terms, which allows us to refine a vectorial representation of SPARQL items. With this refinement we obtained better results in the posterior training for the semantic parsing model. In the second contribution we propose a neural architecture, that we call Encoder CFG-Decoder, whose output conforms to a given context-free grammar. Unlike the traditional LSTM encoder-decoder, our model provides a grammatical guarantee for the mapping process, which is particularly important for practical cases where grammatical errors can cause critical failures. Results confirm that any output generated by our model obeys the given CFG, and we observe a translation accuracy improvement when compared with other results from the literature.A análise semântica é o processo de mapear uma sentença em linguagem natural para uma representação formal, interpretável por máquina, do seu significado. O LSTM Encoder-Decoder é uma arquitetura de rede neural com a capacidade de mapear uma sequência de origem para uma sequência de destino. Estamos interessados no problema de mapear a linguagem natural em consultas SPARQL e procuramos contribuir com estratégias que não dependam de regras artesanais, léxico de alta qualidade, modelos construídos manualmente ou outras estruturas complexas feitas à mão. Neste contexto, apresentamos duas contribuições para o problema de análise semântica partindo da arquitetura LSTM Encoder-Decoder. Enquanto para a linguagem natural existem métodos de representação vetorial bem definidos que usam um volume muito grande de textos, as linguagens formais, como as consultas SPARQL, sofrem com a falta de métodos adequados para representação vetorial. Na primeira contribuição, melhoramos a representação dos vetores SPARQL. Começamos obtendo uma matriz de alinhamento entre os dois vocabulários, linguagem natural e termos SPARQL, o que nos permite refinar uma representação vetorial dos termos SPARQL. Com esse refinamento, obtivemos melhores resultados no treinamento posterior para o modelo de análise semântica. Na segunda contribuição, propomos uma arquitetura neural, que chamamos de Encoder CFG-Decoder, cuja saída está de acordo com uma determinada gramática livre de contexto. Ao contrário do modelo tradicional LSTM Encoder-Decoder, nosso modelo fornece uma garantia gramatical para o processo de mapeamento, o que é particularmente importante para casos práticos nos quais erros gramaticais podem causar falhas críticas em um compilador ou interpretador. Os resultados confirmam que qualquer resultado gerado pelo nosso modelo obedece à CFG dada, e observamos uma melhora na precisão da tradução quando comparada com outros resultados da literatura.Biblioteca Digitais de Teses e Dissertações da USPFinger, MarceloLuz, Fabiano Ferreira2019-02-07info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/45/45134/tde-01042019-101602/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2019-06-07T17:00:33Zoai:teses.usp.br:tde-01042019-101602Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212019-06-07T17:00:33Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Deep neural semantic parsing: translating from natural language into SPARQL Análise semântica neural profunda: traduzindo de linguagem natural para SPARQL
title	Deep neural semantic parsing: translating from natural language into SPARQL
spellingShingle	Deep neural semantic parsing: translating from natural language into SPARQL Luz, Fabiano Ferreira Análise semântica CFG Codificação decodificação Encoder decoder GLC Gramáticas Grammars LSTM LSTM NLP Ontologias Ontology Palavras associadas PLN RDF RDF RNN RNN Semantic parsing SPARQL SPARQL Word embeddings
title_short	Deep neural semantic parsing: translating from natural language into SPARQL
title_full	Deep neural semantic parsing: translating from natural language into SPARQL
title_fullStr	Deep neural semantic parsing: translating from natural language into SPARQL
title_full_unstemmed	Deep neural semantic parsing: translating from natural language into SPARQL
title_sort	Deep neural semantic parsing: translating from natural language into SPARQL
author	Luz, Fabiano Ferreira
author_facet	Luz, Fabiano Ferreira
author_role	author
dc.contributor.none.fl_str_mv	Finger, Marcelo
dc.contributor.author.fl_str_mv	Luz, Fabiano Ferreira
dc.subject.por.fl_str_mv	Análise semântica CFG Codificação decodificação Encoder decoder GLC Gramáticas Grammars LSTM LSTM NLP Ontologias Ontology Palavras associadas PLN RDF RDF RNN RNN Semantic parsing SPARQL SPARQL Word embeddings
topic	Análise semântica CFG Codificação decodificação Encoder decoder GLC Gramáticas Grammars LSTM LSTM NLP Ontologias Ontology Palavras associadas PLN RDF RDF RNN RNN Semantic parsing SPARQL SPARQL Word embeddings
description	Semantic parsing is the process of mapping a natural-language sentence into a machine-readable, formal representation of its meaning. The LSTM Encoder-Decoder is a neural architecture with the ability to map a source language into a target one. We are interested in the problem of mapping natural language into SPARQL queries, and we seek to contribute with strategies that do not rely on handcrafted rules, high-quality lexicons, manually-built templates or other handmade complex structures. In this context, we present two contributions to the problem of semantic parsing departing from the LSTM encoder-decoder. While natural language has well defined vector representation methods that use a very large volume of texts, formal languages, like SPARQL queries, suffer from lack of suitable methods for vector representation. In the first contribution we improve the representation of SPARQL vectors. We start by obtaining an alignment matrix between the two vocabularies, natural language and SPARQL terms, which allows us to refine a vectorial representation of SPARQL items. With this refinement we obtained better results in the posterior training for the semantic parsing model. In the second contribution we propose a neural architecture, that we call Encoder CFG-Decoder, whose output conforms to a given context-free grammar. Unlike the traditional LSTM encoder-decoder, our model provides a grammatical guarantee for the mapping process, which is particularly important for practical cases where grammatical errors can cause critical failures. Results confirm that any output generated by our model obeys the given CFG, and we observe a translation accuracy improvement when compared with other results from the literature.
publishDate	2019
dc.date.none.fl_str_mv	2019-02-07
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://www.teses.usp.br/teses/disponiveis/45/45134/tde-01042019-101602/
url	http://www.teses.usp.br/teses/disponiveis/45/45134/tde-01042019-101602/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1815256575287230464

Deep neural semantic parsing: translating from natural language into SPARQL

Registros relacionados