Linguistic and orthographical classic Portuguese variants. Challenges for NLP

Cameron, Helena; Gonçalves, Maria Filomena; Quaresma, Paulo

Linguistic and orthographical classic Portuguese variants. Challenges for NLP

Detalhes bibliográficos
Autor(a) principal:	Cameron, Helena
Data de Publicação:	2020
Outros Autores:	Gonçalves, Maria Filomena, Quaresma, Paulo
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10174/28061
Resumo:	In recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work.

Metadados do item

id	RCAP_17b04bbef4e4f796249bf37576bfd939
oai_identifier_str	oai:dspace.uevora.pt:10174/28061
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Linguistic and orthographical classic Portuguese variants. Challenges for NLPClassical PortugueseNLPIn recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work.CEUR-WP org.2020-08-10T15:28:06Z2020-08-102020-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/28061http://hdl.handle.net/10174/28061engCameron, Helena Freire; Gonçalves, Maria Filomena; Quaresma, Paulo (2020): "Linguistic and orthographical classic Portuguese variants. Challenges for NLP". In: Maria José Finatto, Renata Vieira, Senja Pollak and Saturnino Luz (ed.), Proceedings of the Workshop on Digital Humanities and Natural Language Processing, co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2020), vol. 2607. Évora (Portugal): CEUR-WP org, 43-48.1613-0073http://ceur-ws.org/Vol-2607/short1.pdfDLLhelenafc@uevora.ptmfg@uevora.ptpq@uevora.pt619Cameron, HelenaGonçalves, Maria FilomenaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:24:00Zoai:dspace.uevora.pt:10174/28061Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:17:59.447672Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title	Linguistic and orthographical classic Portuguese variants. Challenges for NLP
spellingShingle	Linguistic and orthographical classic Portuguese variants. Challenges for NLP Cameron, Helena Classical Portuguese NLP
title_short	Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title_full	Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title_fullStr	Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title_full_unstemmed	Linguistic and orthographical classic Portuguese variants. Challenges for NLP
title_sort	Linguistic and orthographical classic Portuguese variants. Challenges for NLP
author	Cameron, Helena
author_facet	Cameron, Helena Gonçalves, Maria Filomena Quaresma, Paulo
author_role	author
author2	Gonçalves, Maria Filomena Quaresma, Paulo
author2_role	author author
dc.contributor.author.fl_str_mv	Cameron, Helena Gonçalves, Maria Filomena Quaresma, Paulo
dc.subject.por.fl_str_mv	Classical Portuguese NLP
topic	Classical Portuguese NLP
description	In recent times, it was made a great investment in transfer from physical ancient Portuguese texts to digital support. This support transfer allows not only the access to the texts, bringing them to the public in general, but also the possibility of texts to be readable and processed by machines. NLP tools are addressed, mainly, to contemporary Portuguese and the application of NLP to classic texts has several difficulties. The elaboration of big lexical corpora of forms previous to modern Portuguese is an opportunity for multidisciplinary field of studies allowing the enlargement of linguistic studies and also the possibility of obtaining, by NLP, validated corpora, collections and ontologies, that can be input in NLP tools for ancient Portuguese texts. In this work we will present, briefly, the problem of lexical variation of forms in processing classic Portuguese texts, the challenges that emerge from them and future perspectives of work.
publishDate	2020
dc.date.none.fl_str_mv	2020-08-10T15:28:06Z 2020-08-10 2020-03-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10174/28061 http://hdl.handle.net/10174/28061
url	http://hdl.handle.net/10174/28061
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Cameron, Helena Freire; Gonçalves, Maria Filomena; Quaresma, Paulo (2020): "Linguistic and orthographical classic Portuguese variants. Challenges for NLP". In: Maria José Finatto, Renata Vieira, Senja Pollak and Saturnino Luz (ed.), Proceedings of the Workshop on Digital Humanities and Natural Language Processing, co-located with International Conference on the Computational Processing of Portuguese (PROPOR 2020), vol. 2607. Évora (Portugal): CEUR-WP org, 43-48. 1613-0073 http://ceur-ws.org/Vol-2607/short1.pdf DLL helenafc@uevora.pt mfg@uevora.pt pq@uevora.pt 619
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	CEUR-WP org.
publisher.none.fl_str_mv	CEUR-WP org.
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799136661795241984

Linguistic and orthographical classic Portuguese variants. Challenges for NLP

Registros relacionados