Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese

Faria, Pablo

Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese

Detalhes bibliográficos
Autor(a) principal:	Faria, Pablo
Data de Publicação:	2019
Tipo de documento:	Artigo
Idioma:	por
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://doi.org/10.21814/diacritica.415
Resumo:	A child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results.

Metadados do item

id	RCAP_42a553fb4132f41a55db29527c34f76c
oai_identifier_str	oai:journals.uminho.pt:article/5063
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Learning parts-of-speech through distributional analysis. Further results from brazilian portugueseAprendizagem de categorias de palavras por análise distribucional resultados adicionais para Português BrasileiroLanguage acquisitionPart-of-speech learningDistributional analysisCognitive modellingAquisição da linguagemAprendizagem de categoriasAnálise distribucionalModelagem CognitivaA child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results.Uma criança adquirindo a língua deve descobrir quais são as categorias sintáticas em sua língua e atribuir palavras a uma ou mais delas. A questão que nos propomos a responder aqui é o quanto dessa aprendizagem pode ser realizada através da análise distribucional de enunciados. Para este fim, uma re-implementação do modelo computacional de Redington, Chater e Finch (1998) foi conduzida e aplicada a dados do Português Brasileiro, obtidos de corpora disponíveis publicamente, tanto com fala dirigida à criança, quanto com fala entre adultos. Os resultados de todos os experimentos são apresentados e discutidos. Estes experimentos investigam mais variáveis e aspectos envolvidos nesta tarefa de aprendizagem: tipos de contextos distribucionais, o número de palavras-alvo e de contexto assumidas, o valor da informação distribucional para as diferentes categorias, tamanho do corpus etc. Uma comparação entre a fala dirigida à criança e a fala entre adultos também é feita. Em geral, nossos resultados dão suporte aos de Redington et al. (1998), embora tenhamos encontrado algumas diferenças possivelmente importantes e até contraditórias. Também avaliamos a medida cosseno, comparando a performance obtida com ela à performance obtida com a correlação de Spearman usada no estudo de Redington et al. (1998). Esta última parece produzir melhor performance. Neste artigo, focamos numa análise quantitativa dos nossos resultados.CEHUM2019-12-16info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.21814/diacritica.415https://doi.org/10.21814/diacritica.415Diacrítica; Vol. 33 N.º 2 (2019): Linguística Experimental e Variedades do Português; 229-251Diacrítica; Vol. 33 No. 2 (2019): Experimental Linguistics and Portuguese Language Varieties; 229-2512183-91740870-896710.21814/diacritica.33.2reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPporhttps://revistas.uminho.pt/index.php/diacritica/article/view/5063https://revistas.uminho.pt/index.php/diacritica/article/view/5063/5536Direitos de Autor (c) 2023 Pablo Fariainfo:eu-repo/semantics/openAccessFaria, Pablo2023-07-28T07:47:58Zoai:journals.uminho.pt:article/5063Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:34:41.567426Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese Aprendizagem de categorias de palavras por análise distribucional resultados adicionais para Português Brasileiro
title	Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
spellingShingle	Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese Faria, Pablo Language acquisition Part-of-speech learning Distributional analysis Cognitive modelling Aquisição da linguagem Aprendizagem de categorias Análise distribucional Modelagem Cognitiva
title_short	Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
title_full	Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
title_fullStr	Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
title_full_unstemmed	Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
title_sort	Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese
author	Faria, Pablo
author_facet	Faria, Pablo
author_role	author
dc.contributor.author.fl_str_mv	Faria, Pablo
dc.subject.por.fl_str_mv	Language acquisition Part-of-speech learning Distributional analysis Cognitive modelling Aquisição da linguagem Aprendizagem de categorias Análise distribucional Modelagem Cognitiva
topic	Language acquisition Part-of-speech learning Distributional analysis Cognitive modelling Aquisição da linguagem Aprendizagem de categorias Análise distribucional Modelagem Cognitiva
description	A child learning a language has to figure out what the syntactic, or part-of-speech, categories in her language are and assign words to one or more of them. The question we aim to answer here is how much of this learning can be accomplished through the distributional analysis of utterances. To this end, a reimplementation of Redington, Chater and Finch (1998) computational model was conducted and applied to Brazilian Portuguese input data, obtained from publicly available corpora of both child-directed and adult-to-adult speech. Results from all experiments are presented and discussed. These experiments investigate many variables and aspects involved in this learning task: types of distributional contexts, the number of target and context words, the value of distributional information for different categories, corpus size, etc. A comparison between child-directed speech and adult-to-adult speech is also carried out. In general, our results support Redington et al.’s (1998), although we find some possibly important, and maybe contradictory, differences. We also evaluate the cosine metric, comparing it with performance obtained with the Spearman rank correlation metric used in Redington et al.’s (1998) study. The latter seems to produce better performance. In this paper we focus on a quantitative analysis of our results.
publishDate	2019
dc.date.none.fl_str_mv	2019-12-16
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://doi.org/10.21814/diacritica.415 https://doi.org/10.21814/diacritica.415
url	https://doi.org/10.21814/diacritica.415
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	https://revistas.uminho.pt/index.php/diacritica/article/view/5063 https://revistas.uminho.pt/index.php/diacritica/article/view/5063/5536
dc.rights.driver.fl_str_mv	Direitos de Autor (c) 2023 Pablo Faria info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Direitos de Autor (c) 2023 Pablo Faria
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	CEHUM
publisher.none.fl_str_mv	CEHUM
dc.source.none.fl_str_mv	Diacrítica; Vol. 33 N.º 2 (2019): Linguística Experimental e Variedades do Português; 229-251 Diacrítica; Vol. 33 No. 2 (2019): Experimental Linguistics and Portuguese Language Varieties; 229-251 2183-9174 0870-8967 10.21814/diacritica.33.2 reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799132073422749696

Learning parts-of-speech through distributional analysis. Further results from brazilian portuguese

Registros relacionados