Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese

Detalhes bibliográficos
Autor(a) principal: Faria, Pablo
Data de Publicação: 2019
Tipo de documento: Artigo
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://doi.org/10.21814/diacritica.415
Resumo: A child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results.
id RCAP_42a553fb4132f41a55db29527c34f76c
oai_identifier_str oai:journals.uminho.pt:article/5063
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Learning parts-of-speech through distributional analysis. Further results from brazilian portugueseAprendizagem de categorias de palavras por análise distribucional resultados adicionais para Português BrasileiroLanguage acquisitionPart-of-speech learningDistributional analysisCognitive modellingAquisição da linguagemAprendizagem de categoriasAnálise distribucionalModelagem CognitivaA child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results.Uma criança adquirindo a língua deve descobrir quais são as categorias sintáticas em sua língua e atribuir palavras a uma ou mais delas. A questão que nos propomos a responder aqui é o quanto dessa aprendizagem pode ser realizada através da análise distribucional de enunciados. Para este fim, uma re-implementação do modelo computacional de Redington, Chater e Finch (1998) foi conduzida e aplicada a dados do Português Brasileiro, obtidos de corpora disponíveis publicamente, tanto com fala dirigida à criança, quanto com fala entre adultos. Os resultados de todos os experimentos são apresentados e discutidos. Estes experimentos investigam mais variáveis e aspectos envolvidos nesta tarefa de aprendizagem: tipos de contextos distribucionais, o número de palavras-alvo e de contexto assumidas, o valor da informação distribucional para as diferentes categorias, tamanho do corpus etc. Uma comparação entre a fala dirigida à criança e a fala entre adultos também é feita. Em geral, nossos resultados dão suporte aos de Redington et al. (1998), embora tenhamos encontrado algumas diferenças possivelmente importantes e até contraditórias. Também avaliamos a medida cosseno, comparando a performance obtida com ela à performance obtida com a correlação de Spearman usada no estudo de Redington et al. (1998). Esta última parece produzir melhor performance. Neste artigo, focamos numa análise quantitativa dos nossos resultados.CEHUM2019-12-16info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.21814/diacritica.415https://doi.org/10.21814/diacritica.415Diacrítica; Vol. 33 N.º 2 (2019): Linguística Experimental e Variedades do Português; 229-251Diacrítica; Vol. 33 No. 2 (2019): Experimental Linguistics and Portuguese Language Varieties; 229-2512183-91740870-896710.21814/diacritica.33.2reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://revistas.uminho.pt/index.php/diacritica/article/view/5063https://revistas.uminho.pt/index.php/diacritica/article/view/5063/5536Direitos de Autor (c) 2023 Pablo Fariainfo:eu-repo/semantics/openAccessFaria, Pablo2023-07-28T07:47:58Zoai:journals.uminho.pt:article/5063Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:34:41.567426Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
Aprendizagem de categorias de palavras por análise distribucional resultados adicionais para Português Brasileiro
title Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
spellingShingle Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
Faria, Pablo
Language acquisition
Part-of-speech learning
Distributional analysis
Cognitive modelling
Aquisição da linguagem
Aprendizagem de categorias
Análise distribucional
Modelagem Cognitiva
title_short Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
title_full Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
title_fullStr Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
title_full_unstemmed Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
title_sort Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
author Faria, Pablo
author_facet Faria, Pablo
author_role author
dc.contributor.author.fl_str_mv Faria, Pablo
dc.subject.por.fl_str_mv Language acquisition
Part-of-speech learning
Distributional analysis
Cognitive modelling
Aquisição da linguagem
Aprendizagem de categorias
Análise distribucional
Modelagem Cognitiva
topic Language acquisition
Part-of-speech learning
Distributional analysis
Cognitive modelling
Aquisição da linguagem
Aprendizagem de categorias
Análise distribucional
Modelagem Cognitiva
description A child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results.
publishDate 2019
dc.date.none.fl_str_mv 2019-12-16
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi.org/10.21814/diacritica.415
https://doi.org/10.21814/diacritica.415
url https://doi.org/10.21814/diacritica.415
dc.language.iso.fl_str_mv por
language por
dc.relation.none.fl_str_mv https://revistas.uminho.pt/index.php/diacritica/article/view/5063
https://revistas.uminho.pt/index.php/diacritica/article/view/5063/5536
dc.rights.driver.fl_str_mv Direitos de Autor (c) 2023 Pablo Faria
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Direitos de Autor (c) 2023 Pablo Faria
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv CEHUM
publisher.none.fl_str_mv CEHUM
dc.source.none.fl_str_mv Diacrítica; Vol. 33 N.º 2 (2019): Linguística Experimental e Variedades do Português; 229-251
Diacrítica; Vol. 33 No. 2 (2019): Experimental Linguistics and Portuguese Language Varieties; 229-251
2183-9174
0870-8967
10.21814/diacritica.33.2
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132073422749696