SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais

Jr., Juscelino Izidoro de Oliveira

SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais

Detalhes bibliográficos
Autor(a) principal:	Jr., Juscelino Izidoro de Oliveira
Data de Publicação:	2012
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Biblioteca Digital de Teses e Dissertações da UEPG
Texto Completo:	http://tede2.uepg.br/jspui/handle/prefix/152
Resumo:	Multivariate data analysis allows the researcher to verify the interaction among a lot of attributes that can influence the behavior of a response variable. That analysis uses models that can be induced from experimental data set. An important issue in the induction of multivariate regressors and classifers is the sample size, because this determines the reliability of the model for tasks of regression or classification of the response variable. This work approachs the sample size issue through the Theory of Probably Approximately Correct Learning, that comes from problems about machine learning for induction of models. Given the importance of agricultural modelling, this work shows two procedures to select variables. Variable Selection by Principal Component Analysis is an unsupervised procedure and allows the researcher to select the most relevant variables from the agricultural data by considering the variation in the data. Variable Selection by Supervised Principal Component Analysis is a supervised procedure and allows the researcher to perform the same process as in the previous procedure, but concentrating the focus of the selection over the variables with more influence in the behavior of the response variable. Both procedures allow the sample complexity informations to be explored in variable selection process. Those procedures were tested in five experiments, showing that the supervised procedure has allowed to induce models that produced better scores, by mean, than that models induced over variables selected by unsupervised procedure. Those experiments also allowed to verify that the variables selected by the unsupervised and supervised procedure showed reduced indices of multicolinearity.

Metadados do item

id	UEPG_a0c9cb1781701f76969dd9d7053da1e6
oai_identifier_str	oai:tede2.uepg.br:prefix/152
network_acronym_str	UEPG
network_name_str	Biblioteca Digital de Teses e Dissertações da UEPG
repository_id_str
spelling	Rocha, Jose Carlos Ferreira daCPF:64502430900http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4703018J8Mathias, Ivo MarioCPF:34108181972http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4705808H0Kikuti, DanielCPF:00895718944http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4705442D9CPF:34222533866Jr., Juscelino Izidoro de Oliveira2017-07-21T14:19:33Z2012-10-172017-07-21T14:19:33Z2012-07-30JR., Juscelino Izidoro de Oliveira. SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais. 2012. 88 f. Dissertação (Mestrado em Computação para Tecnologias em Agricultura) - UNIVERSIDADE ESTADUAL DE PONTA GROSSA, Ponta Grossa, 2012.http://tede2.uepg.br/jspui/handle/prefix/152Multivariate data analysis allows the researcher to verify the interaction among a lot of attributes that can influence the behavior of a response variable. That analysis uses models that can be induced from experimental data set. An important issue in the induction of multivariate regressors and classifers is the sample size, because this determines the reliability of the model for tasks of regression or classification of the response variable. This work approachs the sample size issue through the Theory of Probably Approximately Correct Learning, that comes from problems about machine learning for induction of models. Given the importance of agricultural modelling, this work shows two procedures to select variables. Variable Selection by Principal Component Analysis is an unsupervised procedure and allows the researcher to select the most relevant variables from the agricultural data by considering the variation in the data. Variable Selection by Supervised Principal Component Analysis is a supervised procedure and allows the researcher to perform the same process as in the previous procedure, but concentrating the focus of the selection over the variables with more influence in the behavior of the response variable. Both procedures allow the sample complexity informations to be explored in variable selection process. Those procedures were tested in five experiments, showing that the supervised procedure has allowed to induce models that produced better scores, by mean, than that models induced over variables selected by unsupervised procedure. Those experiments also allowed to verify that the variables selected by the unsupervised and supervised procedure showed reduced indices of multicolinearity.A análise multivariada de dados permite verificar a interação de vários atributos que podem influenciar o comportamento de uma variável de resposta. Tal análise utiliza modelos que podem ser induzidos de conjuntos de dados experimentais. Um fator importante na indução de regressores e classificadores multivariados é o tamanho da amostra, pois, esta determina a contabilidade do modelo quando há a necessidade de se regredir ou classificar a variável de resposta. Este trabalho aborda a questão do tamanho da amostra por meio da Teoria do Aprendizado Provavelmente Aproximadamente Correto, oriundo de problemas sobre o aprendizado de máquina para a indução de modelos. Dada a importância da modelagem agrícola, este trabalho apresenta dois procedimentos para a seleção de variáveis. O procedimento de Seleção de Variáveis por Análise de Componentes Principais, que não é supervisionado e permite ao pesquisador de agricultura selecionar as variáveis mais relevantes de um conjunto de dados agrícolas considerando a variação contida nos dados. O procedimento de Seleção de Variáveis por Análise de Componentes Principais Supervisionado, que é supervisionado e permite realizar o mesmo processo do primeiro procedimento, mas concentrando-se apenas nas variáveis que possuem maior infuência no comportamento da variável de resposta. Ambos permitem que informações a respeito da complexidade da amostra sejam exploradas na seleção de variáveis. Os dois procedimentos foram avaliados em cinco experimentos, mostrando que o procedimento supervisionado permitiu, em média, induzir modelos que produziram melhores pontuações do que aqueles modelos gerados sobre as variáveis selecionadas pelo procedimento não supervisionado. Os experimentos também permitiram verificar que as variáveis selecionadas por ambos os procedimentos apresentavam índices reduzidos de multicolinaridade..Made available in DSpace on 2017-07-21T14:19:33Z (GMT). No. of bitstreams: 1 Juscelino Izidoro Oliveira.pdf: 622255 bytes, checksum: 54447b380bca4ea8e2360060669d5cff (MD5) Previous issue date: 2012-07-30Coordenação de Aperfeiçoamento de Pessoal de Nível Superiorapplication/pdfporUNIVERSIDADE ESTADUAL DE PONTA GROSSAPrograma de Pós Graduação Computação AplicadaUEPGBRComputação para Tecnologias em Agriculturacomplexidade da amostradados rotuladosredução de dimensionalidade.sample complexitylabeled datadimensionality reductionCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOSELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principaisinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UEPGinstname:Universidade Estadual de Ponta Grossa (UEPG)instacron:UEPGORIGINALJuscelino Izidoro Oliveira.pdfapplication/pdf622255http://tede2.uepg.br/jspui/bitstream/prefix/152/1/Juscelino%20Izidoro%20Oliveira.pdf54447b380bca4ea8e2360060669d5cffMD51prefix/1522017-07-21 11:19:33.914oai:tede2.uepg.br:prefix/152Biblioteca Digital de Teses e Dissertaçõeshttps://tede2.uepg.br/jspui/PUBhttp://tede2.uepg.br/oai/requestbicen@uepg.br\|\|mv_fidelis@yahoo.com.bropendoar:2017-07-21T14:19:33Biblioteca Digital de Teses e Dissertações da UEPG - Universidade Estadual de Ponta Grossa (UEPG)false
dc.title.por.fl_str_mv	SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais
title	SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais
spellingShingle	SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais Jr., Juscelino Izidoro de Oliveira complexidade da amostra dados rotulados redução de dimensionalidade. sample complexity labeled data dimensionality reduction CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short	SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais
title_full	SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais
title_fullStr	SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais
title_full_unstemmed	SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais
title_sort	SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais
author	Jr., Juscelino Izidoro de Oliveira
author_facet	Jr., Juscelino Izidoro de Oliveira
author_role	author
dc.contributor.advisor1.fl_str_mv	Rocha, Jose Carlos Ferreira da
dc.contributor.advisor1ID.fl_str_mv	CPF:64502430900
dc.contributor.advisor1Lattes.fl_str_mv	http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4703018J8
dc.contributor.referee1.fl_str_mv	Mathias, Ivo Mario
dc.contributor.referee1ID.fl_str_mv	CPF:34108181972
dc.contributor.referee1Lattes.fl_str_mv	http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4705808H0
dc.contributor.referee2.fl_str_mv	Kikuti, Daniel
dc.contributor.referee2ID.fl_str_mv	CPF:00895718944
dc.contributor.referee2Lattes.fl_str_mv	http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4705442D9
dc.contributor.authorID.fl_str_mv	CPF:34222533866
dc.contributor.author.fl_str_mv	Jr., Juscelino Izidoro de Oliveira
contributor_str_mv	Rocha, Jose Carlos Ferreira da Mathias, Ivo Mario Kikuti, Daniel
dc.subject.por.fl_str_mv	complexidade da amostra dados rotulados redução de dimensionalidade.
topic	complexidade da amostra dados rotulados redução de dimensionalidade. sample complexity labeled data dimensionality reduction CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv	sample complexity labeled data dimensionality reduction
dc.subject.cnpq.fl_str_mv	CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description	Multivariate data analysis allows the researcher to verify the interaction among a lot of attributes that can influence the behavior of a response variable. That analysis uses models that can be induced from experimental data set. An important issue in the induction of multivariate regressors and classifers is the sample size, because this determines the reliability of the model for tasks of regression or classification of the response variable. This work approachs the sample size issue through the Theory of Probably Approximately Correct Learning, that comes from problems about machine learning for induction of models. Given the importance of agricultural modelling, this work shows two procedures to select variables. Variable Selection by Principal Component Analysis is an unsupervised procedure and allows the researcher to select the most relevant variables from the agricultural data by considering the variation in the data. Variable Selection by Supervised Principal Component Analysis is a supervised procedure and allows the researcher to perform the same process as in the previous procedure, but concentrating the focus of the selection over the variables with more influence in the behavior of the response variable. Both procedures allow the sample complexity informations to be explored in variable selection process. Those procedures were tested in five experiments, showing that the supervised procedure has allowed to induce models that produced better scores, by mean, than that models induced over variables selected by unsupervised procedure. Those experiments also allowed to verify that the variables selected by the unsupervised and supervised procedure showed reduced indices of multicolinearity.
publishDate	2012
dc.date.available.fl_str_mv	2012-10-17 2017-07-21T14:19:33Z
dc.date.issued.fl_str_mv	2012-07-30
dc.date.accessioned.fl_str_mv	2017-07-21T14:19:33Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	JR., Juscelino Izidoro de Oliveira. SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais. 2012. 88 f. Dissertação (Mestrado em Computação para Tecnologias em Agricultura) - UNIVERSIDADE ESTADUAL DE PONTA GROSSA, Ponta Grossa, 2012.
dc.identifier.uri.fl_str_mv	http://tede2.uepg.br/jspui/handle/prefix/152
identifier_str_mv	JR., Juscelino Izidoro de Oliveira. SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais. 2012. 88 f. Dissertação (Mestrado em Computação para Tecnologias em Agricultura) - UNIVERSIDADE ESTADUAL DE PONTA GROSSA, Ponta Grossa, 2012.
url	http://tede2.uepg.br/jspui/handle/prefix/152
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	UNIVERSIDADE ESTADUAL DE PONTA GROSSA
dc.publisher.program.fl_str_mv	Programa de Pós Graduação Computação Aplicada
dc.publisher.initials.fl_str_mv	UEPG
dc.publisher.country.fl_str_mv	BR
dc.publisher.department.fl_str_mv	Computação para Tecnologias em Agricultura
publisher.none.fl_str_mv	UNIVERSIDADE ESTADUAL DE PONTA GROSSA
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da UEPG instname:Universidade Estadual de Ponta Grossa (UEPG) instacron:UEPG
instname_str	Universidade Estadual de Ponta Grossa (UEPG)
instacron_str	UEPG
institution	UEPG
reponame_str	Biblioteca Digital de Teses e Dissertações da UEPG
collection	Biblioteca Digital de Teses e Dissertações da UEPG
bitstream.url.fl_str_mv	http://tede2.uepg.br/jspui/bitstream/prefix/152/1/Juscelino%20Izidoro%20Oliveira.pdf
bitstream.checksum.fl_str_mv	54447b380bca4ea8e2360060669d5cff
bitstream.checksumAlgorithm.fl_str_mv	MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da UEPG - Universidade Estadual de Ponta Grossa (UEPG)
repository.mail.fl_str_mv	bicen@uepg.br\|\|mv_fidelis@yahoo.com.br
_version_	1809460447011667968

SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais

Registros relacionados