Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
Autor(a) principal: | |
---|---|
Data de Publicação: | 2019 |
Tipo de documento: | Artigo |
Idioma: | por |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://doi.org/10.21814/diacritica.415 |
Resumo: | A child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results. |
id |
RCAP_42a553fb4132f41a55db29527c34f76c |
---|---|
oai_identifier_str |
oai:journals.uminho.pt:article/5063 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Learning parts-of-speech through distributional analysis. Further results from brazilian portugueseAprendizagem de categorias de palavras por análise distribucional resultados adicionais para Português BrasileiroLanguage acquisitionPart-of-speech learningDistributional analysisCognitive modellingAquisição da linguagemAprendizagem de categoriasAnálise distribucionalModelagem CognitivaA child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results.Uma criança adquirindo a língua deve descobrir quais são as categorias sintáticas em sua língua e atribuir palavras a uma ou mais delas. A questão que nos propomos a responder aqui é o quanto dessa aprendizagem pode ser realizada através da análise distribucional de enunciados. Para este fim, uma re-implementação do modelo computacional de Redington, Chater e Finch (1998) foi conduzida e aplicada a dados do Português Brasileiro, obtidos de corpora disponíveis publicamente, tanto com fala dirigida à criança, quanto com fala entre adultos. Os resultados de todos os experimentos são apresentados e discutidos. Estes experimentos investigam mais variáveis e aspectos envolvidos nesta tarefa de aprendizagem: tipos de contextos distribucionais, o número de palavras-alvo e de contexto assumidas, o valor da informação distribucional para as diferentes categorias, tamanho do corpus etc. Uma comparação entre a fala dirigida à criança e a fala entre adultos também é feita. Em geral, nossos resultados dão suporte aos de Redington et al. (1998), embora tenhamos encontrado algumas diferenças possivelmente importantes e até contraditórias. Também avaliamos a medida cosseno, comparando a performance obtida com ela à performance obtida com a correlação de Spearman usada no estudo de Redington et al. (1998). Esta última parece produzir melhor performance. Neste artigo, focamos numa análise quantitativa dos nossos resultados.CEHUM2019-12-16info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.21814/diacritica.415https://doi.org/10.21814/diacritica.415Diacrítica; Vol. 33 N.º 2 (2019): Linguística Experimental e Variedades do Português; 229-251Diacrítica; Vol. 33 No. 2 (2019): Experimental Linguistics and Portuguese Language Varieties; 229-2512183-91740870-896710.21814/diacritica.33.2reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://revistas.uminho.pt/index.php/diacritica/article/view/5063https://revistas.uminho.pt/index.php/diacritica/article/view/5063/5536Direitos de Autor (c) 2023 Pablo Fariainfo:eu-repo/semantics/openAccessFaria, Pablo2023-07-28T07:47:58Zoai:journals.uminho.pt:article/5063Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:34:41.567426Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese Aprendizagem de categorias de palavras por análise distribucional resultados adicionais para Português Brasileiro |
title |
Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese |
spellingShingle |
Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese Faria, Pablo Language acquisition Part-of-speech learning Distributional analysis Cognitive modelling Aquisição da linguagem Aprendizagem de categorias Análise distribucional Modelagem Cognitiva |
title_short |
Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese |
title_full |
Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese |
title_fullStr |
Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese |
title_full_unstemmed |
Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese |
title_sort |
Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese |
author |
Faria, Pablo |
author_facet |
Faria, Pablo |
author_role |
author |
dc.contributor.author.fl_str_mv |
Faria, Pablo |
dc.subject.por.fl_str_mv |
Language acquisition Part-of-speech learning Distributional analysis Cognitive modelling Aquisição da linguagem Aprendizagem de categorias Análise distribucional Modelagem Cognitiva |
topic |
Language acquisition Part-of-speech learning Distributional analysis Cognitive modelling Aquisição da linguagem Aprendizagem de categorias Análise distribucional Modelagem Cognitiva |
description |
A child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-12-16 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.21814/diacritica.415 https://doi.org/10.21814/diacritica.415 |
url |
https://doi.org/10.21814/diacritica.415 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.none.fl_str_mv |
https://revistas.uminho.pt/index.php/diacritica/article/view/5063 https://revistas.uminho.pt/index.php/diacritica/article/view/5063/5536 |
dc.rights.driver.fl_str_mv |
Direitos de Autor (c) 2023 Pablo Faria info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Direitos de Autor (c) 2023 Pablo Faria |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
CEHUM |
publisher.none.fl_str_mv |
CEHUM |
dc.source.none.fl_str_mv |
Diacrítica; Vol. 33 N.º 2 (2019): Linguística Experimental e Variedades do Português; 229-251 Diacrítica; Vol. 33 No. 2 (2019): Experimental Linguistics and Portuguese Language Varieties; 229-251 2183-9174 0870-8967 10.21814/diacritica.33.2 reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799132073422749696 |