A flexible compositional approach to word sense disambiguation

Alex de Paula Barros

A flexible compositional approach to word sense disambiguation

Detalhes bibliográficos
Autor(a) principal:	Alex de Paula Barros
Data de Publicação:	2018
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFMG
Texto Completo:	http://hdl.handle.net/1843/SLSC-BBKGTM
Resumo:	Word sense disambiguation is identifying which sense of a word is used in a sentence when the word has multiple meanings. Supervised machine learning methods in which a classifier is trained for each distinct word on a corpus of manually sense-annotated examples have been the most successful algorithms to date. One possible drawback is their lack of flexibility due to requiring annotated examples for every word in the vocabulary. In contrast, knowledge-based methods do not require a classifier for each distinct word and are often built over lexico-semantic resources like ontologies, thesaurus or machine-readable dictionaries. In this work, we propose a flexible compositional algorithm based on context-gloss comparisons, that compares local context of a word represented by its neighbor words with glosses of the possible senses a word can assume using a semantic distance measure. The algorithm has three components, each based on a different information source: (i) sense frequency, obtained by counting the number of times a word occurs with each meaning in an annotated corpus, (ii) extended gloss, obtained by expanding a word dictionary definition using related words in an ontology (e.g., car and automobile), and (iii) sense usage examples, obtained from inventories that provide sentences with usage examples for some senses. Our compositional approach is flexible in the sense that it is not dependent on annotated examples and works well even when some or all of the three aforementioned knowledge sources are not available. We evaluated the performance of our algorithm for all possible combinations of the three components, simulating different scenarios of knowledge sources availability. The algorithm achieves an F1 score of 67.5 when all components are available, presenting a favorable result when compared with a state-of-the-art knowledge-based system that achieves an F1 score of 66.4

Metadados do item

id	UFMG_8ac78be0d9806641f90bc3eb4b7411ed
oai_identifier_str	oai:repositorio.ufmg.br:1843/SLSC-BBKGTM
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	A flexible compositional approach to word sense disambiguationNatural Language ProcessingWord Sense DisambiguationRecuperação da informaçãoComputaçãoProcessamento de linguagem natural (Computação)Word sense disambiguation is identifying which sense of a word is used in a sentence when the word has multiple meanings. Supervised machine learning methods in which a classifier is trained for each distinct word on a corpus of manually sense-annotated examples have been the most successful algorithms to date. One possible drawback is their lack of flexibility due to requiring annotated examples for every word in the vocabulary. In contrast, knowledge-based methods do not require a classifier for each distinct word and are often built over lexico-semantic resources like ontologies, thesaurus or machine-readable dictionaries. In this work, we propose a flexible compositional algorithm based on context-gloss comparisons, that compares local context of a word represented by its neighbor words with glosses of the possible senses a word can assume using a semantic distance measure. The algorithm has three components, each based on a different information source: (i) sense frequency, obtained by counting the number of times a word occurs with each meaning in an annotated corpus, (ii) extended gloss, obtained by expanding a word dictionary definition using related words in an ontology (e.g., car and automobile), and (iii) sense usage examples, obtained from inventories that provide sentences with usage examples for some senses. Our compositional approach is flexible in the sense that it is not dependent on annotated examples and works well even when some or all of the three aforementioned knowledge sources are not available. We evaluated the performance of our algorithm for all possible combinations of the three components, simulating different scenarios of knowledge sources availability. The algorithm achieves an F1 score of 67.5 when all components are available, presenting a favorable result when compared with a state-of-the-art knowledge-based system that achieves an F1 score of 66.4Word sense disambiguation é a tarefa de identificar qual o significado de uma palavra é utilizado em uma sentença quando a palavra possui múltiplos sentidos. Métodos supervisionados de aprendizado de máquina em que um classificador é treinado para cada palavra distinta em um corpus com o significados das palavras manualmente anotados têm obtido os melhores resultados. Uma possível desvantagem destes métodos é a falta de flexibilidade devido à necessidade de exemplos anotados para cada palavra no vocabulário. Em contraste, os métodos baseados em conhecimento não requerem um classificador para cada palavra distinta e são frequentemente construídos sobre recursos léxico-semânticos como ontologias ou tesauros. Neste trabalho, propomos um algoritmo composicional flexível baseado em comparações entre contexto e glosa, que compara o contexto local de uma palavra, representada por suas palavras vizinhas, com glosas dos possíveis sentidos que uma palavra pode assumir usando uma medida de distância semântica. O algoritmo possui três componentes, cada um baseado em uma fonte de informação diferente: (i) frequência de sentido, obtida pela contagem do número de vezes que uma palavra ocorre com cada significado em um corpus anotado, (ii) glosa estendida, expansão da definição de palavras no dicionário usando palavras relacionadas em uma ontologia (por exemplo, carro e automóvel), e (iii) exemplos de uso de sentido, obtidos de dicionários que fornecem frases com exemplos de uso para os sentidos das palavras. Nossa abordagem composicional é flexível no sentido de que não depende de exemplos anotados e funciona bem, mesmo quando algumas ou todas as três fontes de conhecimento mencionadas acima não estão disponíveis. Avaliamos o desempenho de nosso algoritmo para todas as combinações possíveis dos três componentes, simulando diferentes cenários de disponibilidade de fontes de conhecimento. O algoritmo alcança um F1 score de 67,5 quando todos os componentes estão disponíveis, apresentando um resultado favorável quando comparado com o estado da arte em sistemas baseado em conhecimento que atinge um F1 score de 66,4.Universidade Federal de Minas GeraisUFMGNivio ZivianiAdriano Alonso VelosoFlavio Vinicius Diniz de FigueiredoRenato Antonio Celso FerreiraWladmir Cardoso BrandãoAlex de Paula Barros2019-08-10T12:28:57Z2019-08-10T12:28:57Z2018-07-27info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/1843/SLSC-BBKGTMinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2019-11-14T10:52:02Zoai:repositorio.ufmg.br:1843/SLSC-BBKGTMRepositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2019-11-14T10:52:02Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv	A flexible compositional approach to word sense disambiguation
title	A flexible compositional approach to word sense disambiguation
spellingShingle	A flexible compositional approach to word sense disambiguation Alex de Paula Barros Natural Language Processing Word Sense Disambiguation Recuperação da informação Computação Processamento de linguagem natural (Computação)
title_short	A flexible compositional approach to word sense disambiguation
title_full	A flexible compositional approach to word sense disambiguation
title_fullStr	A flexible compositional approach to word sense disambiguation
title_full_unstemmed	A flexible compositional approach to word sense disambiguation
title_sort	A flexible compositional approach to word sense disambiguation
author	Alex de Paula Barros
author_facet	Alex de Paula Barros
author_role	author
dc.contributor.none.fl_str_mv	Nivio Ziviani Adriano Alonso Veloso Flavio Vinicius Diniz de Figueiredo Renato Antonio Celso Ferreira Wladmir Cardoso Brandão
dc.contributor.author.fl_str_mv	Alex de Paula Barros
dc.subject.por.fl_str_mv	Natural Language Processing Word Sense Disambiguation Recuperação da informação Computação Processamento de linguagem natural (Computação)
topic	Natural Language Processing Word Sense Disambiguation Recuperação da informação Computação Processamento de linguagem natural (Computação)
description	Word sense disambiguation is identifying which sense of a word is used in a sentence when the word has multiple meanings. Supervised machine learning methods in which a classifier is trained for each distinct word on a corpus of manually sense-annotated examples have been the most successful algorithms to date. One possible drawback is their lack of flexibility due to requiring annotated examples for every word in the vocabulary. In contrast, knowledge-based methods do not require a classifier for each distinct word and are often built over lexico-semantic resources like ontologies, thesaurus or machine-readable dictionaries. In this work, we propose a flexible compositional algorithm based on context-gloss comparisons, that compares local context of a word represented by its neighbor words with glosses of the possible senses a word can assume using a semantic distance measure. The algorithm has three components, each based on a different information source: (i) sense frequency, obtained by counting the number of times a word occurs with each meaning in an annotated corpus, (ii) extended gloss, obtained by expanding a word dictionary definition using related words in an ontology (e.g., car and automobile), and (iii) sense usage examples, obtained from inventories that provide sentences with usage examples for some senses. Our compositional approach is flexible in the sense that it is not dependent on annotated examples and works well even when some or all of the three aforementioned knowledge sources are not available. We evaluated the performance of our algorithm for all possible combinations of the three components, simulating different scenarios of knowledge sources availability. The algorithm achieves an F1 score of 67.5 when all components are available, presenting a favorable result when compared with a state-of-the-art knowledge-based system that achieves an F1 score of 66.4
publishDate	2018
dc.date.none.fl_str_mv	2018-07-27 2019-08-10T12:28:57Z 2019-08-10T12:28:57Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/1843/SLSC-BBKGTM
url	http://hdl.handle.net/1843/SLSC-BBKGTM
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais UFMG
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais UFMG
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv	repositorio@ufmg.br
_version_	1816829648184541184

A flexible compositional approach to word sense disambiguation

Registros relacionados