Encoding polylexical units with TEI Lex-o

Detalhes bibliográficos
Autor(a) principal: Tasovac, Toma
Data de Publicação: 2020
Outros Autores: Salgado, Ana, Costa, Rute
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/110959
Resumo: UIDB/03213/2020 UIDP/03213/2020
id RCAP_d43d27cf3e1419691336a77477a7f4c3
oai_identifier_str oai:run.unl.pt:10362/110959
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Encoding polylexical units with TEI Lex-oA case studyInteroperabilityLanguage ResourcesLexicographyPolylexical UnitsTEILanguage and LinguisticsLinguistics and LanguageUIDB/03213/2020 UIDP/03213/2020The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as <form>–like constructs, whereas the latter becomes <entry>–like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on <gram> to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.Centro de Linguística da UNL (CLUNL)RUNTasovac, TomaSalgado, AnaCosta, Rute2021-01-29T23:34:57Z2020-08-102020-08-10T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article30application/pdfhttp://hdl.handle.net/10362/110959eng2335-2736PURE: 26206622https://doi.org/10.4312/SLO2.0.2020.2.28-57info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:54:49Zoai:run.unl.pt:10362/110959Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:41:45.819179Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Encoding polylexical units with TEI Lex-o
A case study
title Encoding polylexical units with TEI Lex-o
spellingShingle Encoding polylexical units with TEI Lex-o
Tasovac, Toma
Interoperability
Language Resources
Lexicography
Polylexical Units
TEI
Language and Linguistics
Linguistics and Language
title_short Encoding polylexical units with TEI Lex-o
title_full Encoding polylexical units with TEI Lex-o
title_fullStr Encoding polylexical units with TEI Lex-o
title_full_unstemmed Encoding polylexical units with TEI Lex-o
title_sort Encoding polylexical units with TEI Lex-o
author Tasovac, Toma
author_facet Tasovac, Toma
Salgado, Ana
Costa, Rute
author_role author
author2 Salgado, Ana
Costa, Rute
author2_role author
author
dc.contributor.none.fl_str_mv Centro de Linguística da UNL (CLUNL)
RUN
dc.contributor.author.fl_str_mv Tasovac, Toma
Salgado, Ana
Costa, Rute
dc.subject.por.fl_str_mv Interoperability
Language Resources
Lexicography
Polylexical Units
TEI
Language and Linguistics
Linguistics and Language
topic Interoperability
Language Resources
Lexicography
Polylexical Units
TEI
Language and Linguistics
Linguistics and Language
description UIDB/03213/2020 UIDP/03213/2020
publishDate 2020
dc.date.none.fl_str_mv 2020-08-10
2020-08-10T00:00:00Z
2021-01-29T23:34:57Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/110959
url http://hdl.handle.net/10362/110959
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2335-2736
PURE: 26206622
https://doi.org/10.4312/SLO2.0.2020.2.28-57
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv 30
application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138030377762816