Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps

Detalhes bibliográficos
Autor(a) principal: Orenha-Ottaiano, Adriane [UNESP]
Data de Publicação: 2021
Outros Autores: Garcia, Marcos, de Oliveira Silva, Maria Eugênia Olímpio, L'Homme, Marie-Claude, Ramos, Margarita Alonso, Valêncio, Carlos Roberto [UNESP], Tenório, William [UNESP]
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UNESP
Texto Completo: http://hdl.handle.net/11449/242228
Resumo: This paper describes the first steps of a corpus-based methodology for the development of an online Platform for Multilingual Collocations Dictionaries (PLATCOL). The platform is aimed to be customized for different target audiences according to their needs. It covers various syntactic structures of collocations that fit into the following taxonomy: verbal, adjectival, nominal, and adverbial. Part of its design, layout and methodological procedures are based on the Bilingual Online Collocations Dictionary Platform (Orenha-Ottaiano, 2017). The methodology also relies on the combination of automatic methods to extract candidate collocations (Garcia et al., 2019a) with careful post-editing performed by lexicographers. The automatic approaches take advantage of NLP tools to annotate large corpora with lemmas, PoS-tags and dependency relations in five languages (English, French, Portuguese, Spanish and Chinese). Using these data, we apply statistical measures (Evert et al., 2017; Garcia et al., 2019b) and distributional semantics strategies to select the candidates (Garcia et al., 2019c) and retrieve corpus-based examples (Kilgarriff et al., 2008). We also rely on automatic definition extraction (Bond & Foster, 2013) so that collocations can be more effectively organized according to their specific senses.
id UNSP_75860d42fe8f5723d90a0a9f1905cd36
oai_identifier_str oai:repositorio.unesp.br:11449/242228
network_acronym_str UNSP
network_name_str Repositório Institucional da UNESP
repository_id_str 2946
spelling Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Stepsautomatic extractioncollocationscollocations dictionarylexicographyonline platformThis paper describes the first steps of a corpus-based methodology for the development of an online Platform for Multilingual Collocations Dictionaries (PLATCOL). The platform is aimed to be customized for different target audiences according to their needs. It covers various syntactic structures of collocations that fit into the following taxonomy: verbal, adjectival, nominal, and adverbial. Part of its design, layout and methodological procedures are based on the Bilingual Online Collocations Dictionary Platform (Orenha-Ottaiano, 2017). The methodology also relies on the combination of automatic methods to extract candidate collocations (Garcia et al., 2019a) with careful post-editing performed by lexicographers. The automatic approaches take advantage of NLP tools to annotate large corpora with lemmas, PoS-tags and dependency relations in five languages (English, French, Portuguese, Spanish and Chinese). Using these data, we apply statistical measures (Evert et al., 2017; Garcia et al., 2019b) and distributional semantics strategies to select the candidates (Garcia et al., 2019c) and retrieve corpus-based examples (Kilgarriff et al., 2008). We also rely on automatic definition extraction (Bond & Foster, 2013) so that collocations can be more effectively organized according to their specific senses.São Paulo State University (UNESP)Universidade de Santiago de CompostelaUniversity of AlcaláOLST Université de MontréalUniversidade da CoruñaSão Paulo State University (UNESP)Universidade Estadual Paulista (UNESP)Universidade de Santiago de CompostelaUniversity of AlcaláUniversité de MontréalUniversidade da CoruñaOrenha-Ottaiano, Adriane [UNESP]Garcia, Marcosde Oliveira Silva, Maria Eugênia OlímpioL'Homme, Marie-ClaudeRamos, Margarita AlonsoValêncio, Carlos Roberto [UNESP]Tenório, William [UNESP]2023-03-02T12:09:18Z2023-03-02T12:09:18Z2021-01-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObject1-28Proceedings of Electronic Lexicography in the 21st Century Conference, v. 2021-July, p. 1-28.2533-5626http://hdl.handle.net/11449/2422282-s2.0-85137087660Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengProceedings of Electronic Lexicography in the 21st Century Conferenceinfo:eu-repo/semantics/openAccess2023-03-02T12:09:18Zoai:repositorio.unesp.br:11449/242228Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462023-03-02T12:09:18Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps
title Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps
spellingShingle Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps
Orenha-Ottaiano, Adriane [UNESP]
automatic extraction
collocations
collocations dictionary
lexicography
online platform
title_short Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps
title_full Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps
title_fullStr Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps
title_full_unstemmed Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps
title_sort Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps
author Orenha-Ottaiano, Adriane [UNESP]
author_facet Orenha-Ottaiano, Adriane [UNESP]
Garcia, Marcos
de Oliveira Silva, Maria Eugênia Olímpio
L'Homme, Marie-Claude
Ramos, Margarita Alonso
Valêncio, Carlos Roberto [UNESP]
Tenório, William [UNESP]
author_role author
author2 Garcia, Marcos
de Oliveira Silva, Maria Eugênia Olímpio
L'Homme, Marie-Claude
Ramos, Margarita Alonso
Valêncio, Carlos Roberto [UNESP]
Tenório, William [UNESP]
author2_role author
author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade Estadual Paulista (UNESP)
Universidade de Santiago de Compostela
University of Alcalá
Université de Montréal
Universidade da Coruña
dc.contributor.author.fl_str_mv Orenha-Ottaiano, Adriane [UNESP]
Garcia, Marcos
de Oliveira Silva, Maria Eugênia Olímpio
L'Homme, Marie-Claude
Ramos, Margarita Alonso
Valêncio, Carlos Roberto [UNESP]
Tenório, William [UNESP]
dc.subject.por.fl_str_mv automatic extraction
collocations
collocations dictionary
lexicography
online platform
topic automatic extraction
collocations
collocations dictionary
lexicography
online platform
description This paper describes the first steps of a corpus-based methodology for the development of an online Platform for Multilingual Collocations Dictionaries (PLATCOL). The platform is aimed to be customized for different target audiences according to their needs. It covers various syntactic structures of collocations that fit into the following taxonomy: verbal, adjectival, nominal, and adverbial. Part of its design, layout and methodological procedures are based on the Bilingual Online Collocations Dictionary Platform (Orenha-Ottaiano, 2017). The methodology also relies on the combination of automatic methods to extract candidate collocations (Garcia et al., 2019a) with careful post-editing performed by lexicographers. The automatic approaches take advantage of NLP tools to annotate large corpora with lemmas, PoS-tags and dependency relations in five languages (English, French, Portuguese, Spanish and Chinese). Using these data, we apply statistical measures (Evert et al., 2017; Garcia et al., 2019b) and distributional semantics strategies to select the candidates (Garcia et al., 2019c) and retrieve corpus-based examples (Kilgarriff et al., 2008). We also rely on automatic definition extraction (Bond & Foster, 2013) so that collocations can be more effectively organized according to their specific senses.
publishDate 2021
dc.date.none.fl_str_mv 2021-01-01
2023-03-02T12:09:18Z
2023-03-02T12:09:18Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv Proceedings of Electronic Lexicography in the 21st Century Conference, v. 2021-July, p. 1-28.
2533-5626
http://hdl.handle.net/11449/242228
2-s2.0-85137087660
identifier_str_mv Proceedings of Electronic Lexicography in the 21st Century Conference, v. 2021-July, p. 1-28.
2533-5626
2-s2.0-85137087660
url http://hdl.handle.net/11449/242228
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Proceedings of Electronic Lexicography in the 21st Century Conference
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 1-28
dc.source.none.fl_str_mv Scopus
reponame:Repositório Institucional da UNESP
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Repositório Institucional da UNESP
collection Repositório Institucional da UNESP
repository.name.fl_str_mv Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_ 1803047361379827712