Linguistic and orthographical classic Portuguese variants. Challenges for NLP
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10174/28061 |
Resumo: | In recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work. |
id |
RCAP_17b04bbef4e4f796249bf37576bfd939 |
---|---|
oai_identifier_str |
oai:dspace.uevora.pt:10174/28061 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Linguistic and orthographical classic Portuguese variants. Challenges for NLPClassical PortugueseNLPIn recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work.CEUR-WP org.2020-08-10T15:28:06Z2020-08-102020-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/28061http://hdl.handle.net/10174/28061engCameron, Helena Freire; Gonçalves, Maria Filomena; Quaresma, Paulo (2020): "Linguistic and orthographical classic Portuguese variants. Challenges for NLP". In: Maria José Finatto, Renata Vieira, Senja Pollak and Saturnino Luz (ed.), Proceedings of the Workshop on Digital Humanities and Natural Language Processing, co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2020), vol. 2607. Évora (Portugal): CEUR-WP org, 43-48.1613-0073http://ceur-ws.org/Vol-2607/short1.pdfDLLhelenafc@uevora.ptmfg@uevora.ptpq@uevora.pt619Cameron, HelenaGonçalves, Maria FilomenaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:24:00Zoai:dspace.uevora.pt:10174/28061Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:17:59.447672Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Linguistic and orthographical classic Portuguese variants. Challenges for NLP |
title |
Linguistic and orthographical classic Portuguese variants. Challenges for NLP |
spellingShingle |
Linguistic and orthographical classic Portuguese variants. Challenges for NLP Cameron, Helena Classical Portuguese NLP |
title_short |
Linguistic and orthographical classic Portuguese variants. Challenges for NLP |
title_full |
Linguistic and orthographical classic Portuguese variants. Challenges for NLP |
title_fullStr |
Linguistic and orthographical classic Portuguese variants. Challenges for NLP |
title_full_unstemmed |
Linguistic and orthographical classic Portuguese variants. Challenges for NLP |
title_sort |
Linguistic and orthographical classic Portuguese variants. Challenges for NLP |
author |
Cameron, Helena |
author_facet |
Cameron, Helena Gonçalves, Maria Filomena Quaresma, Paulo |
author_role |
author |
author2 |
Gonçalves, Maria Filomena Quaresma, Paulo |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Cameron, Helena Gonçalves, Maria Filomena Quaresma, Paulo |
dc.subject.por.fl_str_mv |
Classical Portuguese NLP |
topic |
Classical Portuguese NLP |
description |
In recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-08-10T15:28:06Z 2020-08-10 2020-03-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10174/28061 http://hdl.handle.net/10174/28061 |
url |
http://hdl.handle.net/10174/28061 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Cameron, Helena Freire; Gonçalves, Maria Filomena; Quaresma, Paulo (2020): "Linguistic and orthographical classic Portuguese variants. Challenges for NLP". In: Maria José Finatto, Renata Vieira, Senja Pollak and Saturnino Luz (ed.), Proceedings of the Workshop on Digital Humanities and Natural Language Processing, co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2020), vol. 2607. Évora (Portugal): CEUR-WP org, 43-48. 1613-0073 http://ceur-ws.org/Vol-2607/short1.pdf DLL helenafc@uevora.pt mfg@uevora.pt pq@uevora.pt 619 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
CEUR-WP org. |
publisher.none.fl_str_mv |
CEUR-WP org. |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136661795241984 |