Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | , , , , , |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNESP |
Texto Completo: | http://hdl.handle.net/11449/242228 |
Resumo: | This paper describes the first steps of a corpus-based methodology for the development of an online Platform for Multilingual Collocations Dictionaries (PLATCOL). The platform is aimed to be customized for different target audiences according to their needs. It covers various syntactic structures of collocations that fit into the following taxonomy: verbal, adjectival, nominal, and adverbial. Part of its design, layout and methodological procedures are based on the Bilingual Online Collocations Dictionary Platform (Orenha-Ottaiano, 2017). The methodology also relies on the combination of automatic methods to extract candidate collocations (Garcia et al., 2019a) with careful post-editing performed by lexicographers. The automatic approaches take advantage of NLP tools to annotate large corpora with lemmas, PoS-tags and dependency relations in five languages (English, French, Portuguese, Spanish and Chinese). Using these data, we apply statistical measures (Evert et al., 2017; Garcia et al., 2019b) and distributional semantics strategies to select the candidates (Garcia et al., 2019c) and retrieve corpus-based examples (Kilgarriff et al., 2008). We also rely on automatic definition extraction (Bond & Foster, 2013) so that collocations can be more effectively organized according to their specific senses. |
id |
UNSP_75860d42fe8f5723d90a0a9f1905cd36 |
---|---|
oai_identifier_str |
oai:repositorio.unesp.br:11449/242228 |
network_acronym_str |
UNSP |
network_name_str |
Repositório Institucional da UNESP |
repository_id_str |
2946 |
spelling |
Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Stepsautomatic extractioncollocationscollocations dictionarylexicographyonline platformThis paper describes the first steps of a corpus-based methodology for the development of an online Platform for Multilingual Collocations Dictionaries (PLATCOL). The platform is aimed to be customized for different target audiences according to their needs. It covers various syntactic structures of collocations that fit into the following taxonomy: verbal, adjectival, nominal, and adverbial. Part of its design, layout and methodological procedures are based on the Bilingual Online Collocations Dictionary Platform (Orenha-Ottaiano, 2017). The methodology also relies on the combination of automatic methods to extract candidate collocations (Garcia et al., 2019a) with careful post-editing performed by lexicographers. The automatic approaches take advantage of NLP tools to annotate large corpora with lemmas, PoS-tags and dependency relations in five languages (English, French, Portuguese, Spanish and Chinese). Using these data, we apply statistical measures (Evert et al., 2017; Garcia et al., 2019b) and distributional semantics strategies to select the candidates (Garcia et al., 2019c) and retrieve corpus-based examples (Kilgarriff et al., 2008). We also rely on automatic definition extraction (Bond & Foster, 2013) so that collocations can be more effectively organized according to their specific senses.São Paulo State University (UNESP)Universidade de Santiago de CompostelaUniversity of AlcaláOLST Université de MontréalUniversidade da CoruñaSão Paulo State University (UNESP)Universidade Estadual Paulista (UNESP)Universidade de Santiago de CompostelaUniversity of AlcaláUniversité de MontréalUniversidade da CoruñaOrenha-Ottaiano, Adriane [UNESP]Garcia, Marcosde Oliveira Silva, Maria Eugênia OlímpioL'Homme, Marie-ClaudeRamos, Margarita AlonsoValêncio, Carlos Roberto [UNESP]Tenório, William [UNESP]2023-03-02T12:09:18Z2023-03-02T12:09:18Z2021-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject1-28Proceedings of Electronic Lexicography in the 21st Century Conference, v. 2021-July, p. 1-28.2533-5626http://hdl.handle.net/11449/2422282-s2.0-85137087660Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings of Electronic Lexicography in the 21st Century Conferenceinfo:eu-repo/semantics/openAccess2023-03-02T12:09:18Zoai:repositorio.unesp.br:11449/242228Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T23:18:44.761928Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false |
dc.title.none.fl_str_mv |
Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps |
title |
Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps |
spellingShingle |
Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps Orenha-Ottaiano, Adriane [UNESP] automatic extraction collocations collocations dictionary lexicography online platform |
title_short |
Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps |
title_full |
Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps |
title_fullStr |
Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps |
title_full_unstemmed |
Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps |
title_sort |
Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps |
author |
Orenha-Ottaiano, Adriane [UNESP] |
author_facet |
Orenha-Ottaiano, Adriane [UNESP] Garcia, Marcos de Oliveira Silva, Maria Eugênia Olímpio L'Homme, Marie-Claude Ramos, Margarita Alonso Valêncio, Carlos Roberto [UNESP] Tenório, William [UNESP] |
author_role |
author |
author2 |
Garcia, Marcos de Oliveira Silva, Maria Eugênia Olímpio L'Homme, Marie-Claude Ramos, Margarita Alonso Valêncio, Carlos Roberto [UNESP] Tenório, William [UNESP] |
author2_role |
author author author author author author |
dc.contributor.none.fl_str_mv |
Universidade Estadual Paulista (UNESP) Universidade de Santiago de Compostela University of Alcalá Université de Montréal Universidade da Coruña |
dc.contributor.author.fl_str_mv |
Orenha-Ottaiano, Adriane [UNESP] Garcia, Marcos de Oliveira Silva, Maria Eugênia Olímpio L'Homme, Marie-Claude Ramos, Margarita Alonso Valêncio, Carlos Roberto [UNESP] Tenório, William [UNESP] |
dc.subject.por.fl_str_mv |
automatic extraction collocations collocations dictionary lexicography online platform |
topic |
automatic extraction collocations collocations dictionary lexicography online platform |
description |
This paper describes the first steps of a corpus-based methodology for the development of an online Platform for Multilingual Collocations Dictionaries (PLATCOL). The platform is aimed to be customized for different target audiences according to their needs. It covers various syntactic structures of collocations that fit into the following taxonomy: verbal, adjectival, nominal, and adverbial. Part of its design, layout and methodological procedures are based on the Bilingual Online Collocations Dictionary Platform (Orenha-Ottaiano, 2017). The methodology also relies on the combination of automatic methods to extract candidate collocations (Garcia et al., 2019a) with careful post-editing performed by lexicographers. The automatic approaches take advantage of NLP tools to annotate large corpora with lemmas, PoS-tags and dependency relations in five languages (English, French, Portuguese, Spanish and Chinese). Using these data, we apply statistical measures (Evert et al., 2017; Garcia et al., 2019b) and distributional semantics strategies to select the candidates (Garcia et al., 2019c) and retrieve corpus-based examples (Kilgarriff et al., 2008). We also rely on automatic definition extraction (Bond & Foster, 2013) so that collocations can be more effectively organized according to their specific senses. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-01-01 2023-03-02T12:09:18Z 2023-03-02T12:09:18Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
Proceedings of Electronic Lexicography in the 21st Century Conference, v. 2021-July, p. 1-28. 2533-5626 http://hdl.handle.net/11449/242228 2-s2.0-85137087660 |
identifier_str_mv |
Proceedings of Electronic Lexicography in the 21st Century Conference, v. 2021-July, p. 1-28. 2533-5626 2-s2.0-85137087660 |
url |
http://hdl.handle.net/11449/242228 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Proceedings of Electronic Lexicography in the 21st Century Conference |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
1-28 |
dc.source.none.fl_str_mv |
Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP |
instname_str |
Universidade Estadual Paulista (UNESP) |
instacron_str |
UNESP |
institution |
UNESP |
reponame_str |
Repositório Institucional da UNESP |
collection |
Repositório Institucional da UNESP |
repository.name.fl_str_mv |
Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP) |
repository.mail.fl_str_mv |
|
_version_ |
1808129506711437312 |