Linguistic and orthographical classic Portuguese variants. Challenges for NLP

Detalhes bibliográficos
Autor(a) principal: Cameron, Helena
Data de Publicação: 2020
Outros Autores: Gonçalves, Maria Filomena, Quaresma, Paulo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/28061
Resumo: In recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work.
id RCAP_17b04bbef4e4f796249bf37576bfd939
oai_identifier_str oai:dspace.uevora.pt:10174/28061
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Linguistic and orthographical classic Portuguese variants. Challenges for NLPClassical PortugueseNLPIn recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work.CEUR-WP org.2020-08-10T15:28:06Z2020-08-102020-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/28061http://hdl.handle.net/10174/28061engCameron, Helena Freire; Gonçalves, Maria Filomena; Quaresma, Paulo (2020): "Linguistic and orthographical classic Portuguese variants. Challenges for NLP". In: Maria José Finatto, Renata Vieira, Senja Pollak and Saturnino Luz (ed.), Proceedings of the Workshop on Digital Humanities and Natural Language Processing, co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2020), vol. 2607. Évora (Portugal): CEUR-WP org, 43-48.1613-0073http://ceur-ws.org/Vol-2607/short1.pdfDLLhelenafc@uevora.ptmfg@uevora.ptpq@uevora.pt619Cameron, HelenaGonçalves, Maria FilomenaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:24:00Zoai:dspace.uevora.pt:10174/28061Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:17:59.447672Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title Linguistic and orthographical classic Portuguese variants. Challenges for NLP
spellingShingle Linguistic and orthographical classic Portuguese variants. Challenges for NLP
Cameron, Helena
Classical Portuguese
NLP
title_short Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title_full Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title_fullStr Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title_full_unstemmed Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title_sort Linguistic and orthographical classic Portuguese variants. Challenges for NLP
author Cameron, Helena
author_facet Cameron, Helena
Gonçalves, Maria Filomena
Quaresma, Paulo
author_role author
author2 Gonçalves, Maria Filomena
Quaresma, Paulo
author2_role author
author
dc.contributor.author.fl_str_mv Cameron, Helena
Gonçalves, Maria Filomena
Quaresma, Paulo
dc.subject.por.fl_str_mv Classical Portuguese
NLP
topic Classical Portuguese
NLP
description In recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work.
publishDate 2020
dc.date.none.fl_str_mv 2020-08-10T15:28:06Z
2020-08-10
2020-03-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/28061
http://hdl.handle.net/10174/28061
url http://hdl.handle.net/10174/28061
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Cameron, Helena Freire; Gonçalves, Maria Filomena; Quaresma, Paulo (2020): "Linguistic and orthographical classic Portuguese variants. Challenges for NLP". In: Maria José Finatto, Renata Vieira, Senja Pollak and Saturnino Luz (ed.), Proceedings of the Workshop on Digital Humanities and Natural Language Processing, co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2020), vol. 2607. Évora (Portugal): CEUR-WP org, 43-48.
1613-0073
http://ceur-ws.org/Vol-2607/short1.pdf
DLL
helenafc@uevora.pt
mfg@uevora.pt
pq@uevora.pt
619
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv CEUR-WP org.
publisher.none.fl_str_mv CEUR-WP org.
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136661795241984