Encoding polylexical units with TEI Lex-o
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/110959 |
Resumo: | UIDB/03213/2020 UIDP/03213/2020 |
id |
RCAP_d43d27cf3e1419691336a77477a7f4c3 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/110959 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Encoding polylexical units with TEI Lex-oA case studyInteroperabilityLanguage ResourcesLexicographyPolylexical UnitsTEILanguage and LinguisticsLinguistics and LanguageUIDB/03213/2020 UIDP/03213/2020The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as <form>–like constructs, whereas the latter becomes <entry>–like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on <gram> to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.Centro de Linguística da UNL (CLUNL)RUNTasovac, TomaSalgado, AnaCosta, Rute2021-01-29T23:34:57Z2020-08-102020-08-10T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article30application/pdfhttp://hdl.handle.net/10362/110959eng2335-2736PURE: 26206622https://doi.org/10.4312/SLO2.0.2020.2.28-57info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:54:49Zoai:run.unl.pt:10362/110959Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:41:45.819179Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Encoding polylexical units with TEI Lex-o A case study |
title |
Encoding polylexical units with TEI Lex-o |
spellingShingle |
Encoding polylexical units with TEI Lex-o Tasovac, Toma Interoperability Language Resources Lexicography Polylexical Units TEI Language and Linguistics Linguistics and Language |
title_short |
Encoding polylexical units with TEI Lex-o |
title_full |
Encoding polylexical units with TEI Lex-o |
title_fullStr |
Encoding polylexical units with TEI Lex-o |
title_full_unstemmed |
Encoding polylexical units with TEI Lex-o |
title_sort |
Encoding polylexical units with TEI Lex-o |
author |
Tasovac, Toma |
author_facet |
Tasovac, Toma Salgado, Ana Costa, Rute |
author_role |
author |
author2 |
Salgado, Ana Costa, Rute |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Centro de Linguística da UNL (CLUNL) RUN |
dc.contributor.author.fl_str_mv |
Tasovac, Toma Salgado, Ana Costa, Rute |
dc.subject.por.fl_str_mv |
Interoperability Language Resources Lexicography Polylexical Units TEI Language and Linguistics Linguistics and Language |
topic |
Interoperability Language Resources Lexicography Polylexical Units TEI Language and Linguistics Linguistics and Language |
description |
UIDB/03213/2020 UIDP/03213/2020 |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-08-10 2020-08-10T00:00:00Z 2021-01-29T23:34:57Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/110959 |
url |
http://hdl.handle.net/10362/110959 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2335-2736 PURE: 26206622 https://doi.org/10.4312/SLO2.0.2020.2.28-57 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
30 application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138030377762816 |