Intent-aware semantic query annotation

Detalhes bibliográficos
Autor(a) principal: Rafael Glater da Cruz Machado
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFMG
Texto Completo: http://hdl.handle.net/1843/30489
Resumo: Query understanding is a challenging task primarily due to the inherent ambiguity of natural language. A common strategy for improving the understanding of natural language queries is to annotate them with semantic information mined from a knowledge base. Nevertheless, queries with different intents may arguably benefit from specialized annotation strategies. For instance, some queries could be effectively annotated with a single entity or an entity attribute, others could be better represented by a list of entities of a single type or by entities of multiple distinct types, and others may be simply ambiguous. In this dissertation, we propose a framework for learning semantic query annotations suitable to the target intent of each individual query. Thorough experiments on a publicly available benchmark show that our proposed approach can significantly improve state-of-the-art intent-agnostic approaches based on Markov random fields and learning to rank. Our results further demonstrate the consistent effectiveness of our approach for queries of various target intents, lengths, and difficulty levels, as well as its robustness to noise in intent detection.
id UFMG_10bc8515af6b58236862beb24dc4d3f8
oai_identifier_str oai:repositorio.ufmg.br:1843/30489
network_acronym_str UFMG
network_name_str Repositório Institucional da UFMG
repository_id_str
spelling Rodrygo Luis Teodoro Santoshttp://lattes.cnpq.br/1162362624079364Nivio ZivianiAltigran Soares da SilvaMarcos André Gonçalveshttp://lattes.cnpq.br/7329858225436491Rafael Glater da Cruz Machado2019-10-17T20:20:51Z2019-10-17T20:20:51Z2017-04-07http://hdl.handle.net/1843/30489Query understanding is a challenging task primarily due to the inherent ambiguity of natural language. A common strategy for improving the understanding of natural language queries is to annotate them with semantic information mined from a knowledge base. Nevertheless, queries with different intents may arguably benefit from specialized annotation strategies. For instance, some queries could be effectively annotated with a single entity or an entity attribute, others could be better represented by a list of entities of a single type or by entities of multiple distinct types, and others may be simply ambiguous. In this dissertation, we propose a framework for learning semantic query annotations suitable to the target intent of each individual query. Thorough experiments on a publicly available benchmark show that our proposed approach can significantly improve state-of-the-art intent-agnostic approaches based on Markov random fields and learning to rank. Our results further demonstrate the consistent effectiveness of our approach for queries of various target intents, lengths, and difficulty levels, as well as its robustness to noise in intent detection.O entendimento de uma consulta é uma tarefa desafiadora, principalmente devido à ambigüidade inerente da linguagem natural. Uma estratégia comum para melhorar a compreensão das consultas em linguagem natural é anotá-las com informações semânticas extraídas de uma base de conhecimento. No entanto, consultas com diferentes intenções podem se beneficiar de diferentes estratégias de anotação. Por exemplo, algumas consultas podem ser efetivamente anotadas com uma única entidade ou um atributo de entidade, outras podem ser melhor representadas por uma lista de entidades de um único tipo ou por entidades de vários tipos distintos, e outras podem ser simplesmente ambíguas. Nesta dissertação, propomos um framework para aprendizagem de anotações semânticas em consultas de acordo com a intenção existente em cada uma. Experimentos minuciosos em um benchmark publicamente disponível mostram que a abordagem proposta pode melhorar significativamente quando comparadas às abordagens agnósticas baseadas em campos aleatórios de Markov e de aprendizado de ranqueamento. Nossos resultados demonstram ainda, de forma consistente, a eficácia de nossa abordagem para consultas de várias intenções, comprimentos e níveis de dificuldade, bem como sua robustez ao ruído na detecção de intenção.engUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em Ciência da ComputaçãoUFMGBrasilAprendizado de ranqueamentoRecuperação de informaçãoAprendizado de ranqueamentoRecuperação da informaçãoAprendizado de representaçõesBusca semânticaAnotação semântica em consultasIntent-aware semantic query annotationAnotações semânticas em consultas baseada na intenção do usuárioinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALRafaelGlaterdaCruzMachado.pdfRafaelGlaterdaCruzMachado.pdfapplication/pdf2242627https://repositorio.ufmg.br/bitstream/1843/30489/2/RafaelGlaterdaCruzMachado.pdf3de9f1b5066a753f028d0dc53030b5a8MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82119https://repositorio.ufmg.br/bitstream/1843/30489/3/license.txt34badce4be7e31e3adb4575ae96af679MD53TEXTRafaelGlaterdaCruzMachado.pdf.txtRafaelGlaterdaCruzMachado.pdf.txtExtracted texttext/plain134389https://repositorio.ufmg.br/bitstream/1843/30489/4/RafaelGlaterdaCruzMachado.pdf.txtd6260160e608a9073a6f0b9f1df3d853MD541843/304892019-11-14 12:32:16.97oai:repositorio.ufmg.br:1843/30489TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KCg==Repositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2019-11-14T15:32:16Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv Intent-aware semantic query annotation
dc.title.alternative.pt_BR.fl_str_mv Anotações semânticas em consultas baseada na intenção do usuário
title Intent-aware semantic query annotation
spellingShingle Intent-aware semantic query annotation
Rafael Glater da Cruz Machado
Aprendizado de ranqueamento
Recuperação da informação
Aprendizado de representações
Busca semântica
Anotação semântica em consultas
Aprendizado de ranqueamento
Recuperação de informação
title_short Intent-aware semantic query annotation
title_full Intent-aware semantic query annotation
title_fullStr Intent-aware semantic query annotation
title_full_unstemmed Intent-aware semantic query annotation
title_sort Intent-aware semantic query annotation
author Rafael Glater da Cruz Machado
author_facet Rafael Glater da Cruz Machado
author_role author
dc.contributor.advisor1.fl_str_mv Rodrygo Luis Teodoro Santos
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/1162362624079364
dc.contributor.advisor-co1.fl_str_mv Nivio Ziviani
dc.contributor.referee1.fl_str_mv Altigran Soares da Silva
dc.contributor.referee2.fl_str_mv Marcos André Gonçalves
dc.contributor.authorLattes.fl_str_mv http://lattes.cnpq.br/7329858225436491
dc.contributor.author.fl_str_mv Rafael Glater da Cruz Machado
contributor_str_mv Rodrygo Luis Teodoro Santos
Nivio Ziviani
Altigran Soares da Silva
Marcos André Gonçalves
dc.subject.por.fl_str_mv Aprendizado de ranqueamento
Recuperação da informação
Aprendizado de representações
Busca semântica
Anotação semântica em consultas
topic Aprendizado de ranqueamento
Recuperação da informação
Aprendizado de representações
Busca semântica
Anotação semântica em consultas
Aprendizado de ranqueamento
Recuperação de informação
dc.subject.other.pt_BR.fl_str_mv Aprendizado de ranqueamento
Recuperação de informação
description Query understanding is a challenging task primarily due to the inherent ambiguity of natural language. A common strategy for improving the understanding of natural language queries is to annotate them with semantic information mined from a knowledge base. Nevertheless, queries with different intents may arguably benefit from specialized annotation strategies. For instance, some queries could be effectively annotated with a single entity or an entity attribute, others could be better represented by a list of entities of a single type or by entities of multiple distinct types, and others may be simply ambiguous. In this dissertation, we propose a framework for learning semantic query annotations suitable to the target intent of each individual query. Thorough experiments on a publicly available benchmark show that our proposed approach can significantly improve state-of-the-art intent-agnostic approaches based on Markov random fields and learning to rank. Our results further demonstrate the consistent effectiveness of our approach for queries of various target intents, lengths, and difficulty levels, as well as its robustness to noise in intent detection.
publishDate 2017
dc.date.issued.fl_str_mv 2017-04-07
dc.date.accessioned.fl_str_mv 2019-10-17T20:20:51Z
dc.date.available.fl_str_mv 2019-10-17T20:20:51Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1843/30489
url http://hdl.handle.net/1843/30489
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv UFMG
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFMG
instname:Universidade Federal de Minas Gerais (UFMG)
instacron:UFMG
instname_str Universidade Federal de Minas Gerais (UFMG)
instacron_str UFMG
institution UFMG
reponame_str Repositório Institucional da UFMG
collection Repositório Institucional da UFMG
bitstream.url.fl_str_mv https://repositorio.ufmg.br/bitstream/1843/30489/2/RafaelGlaterdaCruzMachado.pdf
https://repositorio.ufmg.br/bitstream/1843/30489/3/license.txt
https://repositorio.ufmg.br/bitstream/1843/30489/4/RafaelGlaterdaCruzMachado.pdf.txt
bitstream.checksum.fl_str_mv 3de9f1b5066a753f028d0dc53030b5a8
34badce4be7e31e3adb4575ae96af679
d6260160e608a9073a6f0b9f1df3d853
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_ 1801677024955203584