Benchmarking natural language inference and semantic textual similarity for portuguese

Detalhes bibliográficos
Autor(a) principal: Fialho, Pedro
Data de Publicação: 2020
Outros Autores: Coheur, Luísa, Quaresma, Paulo
Tipo de documento: Artigo
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/32114
Resumo: Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large, trained in a Brazilian corpus and tuned in the latter datasets. Besides obtaining state-of-the-art results, this is, to the best of our knowledge, the most all-inclusive study about natural language inference and semantic textual similarity for the Portuguese language.
id RCAP_f9cbd764e4b5ae1876d34338675425cf
oai_identifier_str oai:dspace.uevora.pt:10174/32114
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Benchmarking natural language inference and semantic textual similarity for portugueseTwo sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large, trained in a Brazilian corpus and tuned in the latter datasets. Besides obtaining state-of-the-art results, this is, to the best of our knowledge, the most all-inclusive study about natural language inference and semantic textual similarity for the Portuguese language.MDPI2022-05-30T11:00:06Z2022-05-302020-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/32114http://hdl.handle.net/10174/32114porPedro Fialho, Luı́sa Coheur, and Paulo Quaresma. Benchmarking natural language inference and semantic textual similarity for portuguese. Information, 11(10), 2020.ndndpq@uevora.pt283Fialho, PedroCoheur, LuísaQuaresma, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:32:30Zoai:dspace.uevora.pt:10174/32114Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:21:12.460385Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Benchmarking natural language inference and semantic textual similarity for portuguese
title Benchmarking natural language inference and semantic textual similarity for portuguese
spellingShingle Benchmarking natural language inference and semantic textual similarity for portuguese
Fialho, Pedro
title_short Benchmarking natural language inference and semantic textual similarity for portuguese
title_full Benchmarking natural language inference and semantic textual similarity for portuguese
title_fullStr Benchmarking natural language inference and semantic textual similarity for portuguese
title_full_unstemmed Benchmarking natural language inference and semantic textual similarity for portuguese
title_sort Benchmarking natural language inference and semantic textual similarity for portuguese
author Fialho, Pedro
author_facet Fialho, Pedro
Coheur, Luísa
Quaresma, Paulo
author_role author
author2 Coheur, Luísa
Quaresma, Paulo
author2_role author
author
dc.contributor.author.fl_str_mv Fialho, Pedro
Coheur, Luísa
Quaresma, Paulo
description Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large, trained in a Brazilian corpus and tuned in the latter datasets. Besides obtaining state-of-the-art results, this is, to the best of our knowledge, the most all-inclusive study about natural language inference and semantic textual similarity for the Portuguese language.
publishDate 2020
dc.date.none.fl_str_mv 2020-01-01T00:00:00Z
2022-05-30T11:00:06Z
2022-05-30
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/32114
http://hdl.handle.net/10174/32114
url http://hdl.handle.net/10174/32114
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv Pedro Fialho, Luı́sa Coheur, and Paulo Quaresma. Benchmarking natural language inference and semantic textual similarity for portuguese. Information, 11(10), 2020.
nd
nd
pq@uevora.pt
283
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136693994913792