Building new probability distributions: the composition method and a computer based method

Detalhes bibliográficos
Autor(a) principal: PINHO, Luis Gustavo Bastos
Data de Publicação: 2017
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da UFPE
dARK ID: ark:/64986/001300000grn9
Texto Completo: https://repositorio.ufpe.br/handle/123456789/24966
Resumo: We discuss the creation of new probability distributions for continuous data in two distinct approaches. The first one is, to our knowledge, novelty and consists of using Estimation of Distribution Algorithms (EDAs) to obtain new cumulative distribution functions. This class of algorithms work as follows. A population of solutions for a given problem is randomly selected from a space of candidates, which may contain candidates that are not feasible solutions to the problem. The selection occurs by following a set of probability rules that, initially, assign a uniform distribution to the space of candidates. Each individual is ranked by a fitness criterion. A fraction of the most fit individuals is selected and the probability rules are then adjusted to increase the likelihood of obtaining solutions similar to the most fit in the current population. The algorithm iterates until the set of probability rules are able to provide good solutions to the problem. In our proposal, the algorithm is used to generate cumulative distribution functions to model a given continuous data set. We tried to keep the mathematical expressions of the new functions as simple as possible. The results were satisfactory. We compared the models provided by the algorithm to the ones in already published papers. In every situation, the models proposed by the algorithms had advantages over the ones already published. The main advantage is the relative simplicity of the mathematical expressions obtained. Still in the context of computational tools and algorithms, we show the performance of simple neural networks as a method for parameter estimation in probability distributions. The motivation for this was the need to solve a large number of non linear equations when dealing with SAR images (SAR stands for synthetic aperture radar) in the statistical treatment of such images. The estimation process requires solving, iteratively, a non-linear equation. This is repeated for every pixel and an image usually consists of a large number of pixels. We trained a neural network to approximate an estimator for the parameter of interest. Once trained, the network can be fed the data and it will return an estimate of the parameter of interest without the need of iterative methods. The training of the network can take place even before collecting the data from the radar. The method was tested on simulated and real data sets with satisfactory results. The same method can be applied to different distributions. The second part of this thesis shows two new probability distribution classes obtained from the composition of already existing ones. In each situation, we present the new class and general results such as power series expansions for the probability density functions, expressions for the moments, entropy and alike. The first class is obtained from the composition of the beta-G and Lehmann-type II classes. The second class, from the transmuted-G and Marshall-Olkin-G classes. Distributions in these classes are compared to already existing ones as a way to illustrate the performance of applications to real data sets.
id UFPE_ec1e21f43ed657f2d271b71ce504e4a9
oai_identifier_str oai:repositorio.ufpe.br:123456789/24966
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str 2221
spelling PINHO, Luis Gustavo Bastoshttp://lattes.cnpq.br/7587599315362185http://lattes.cnpq.br/3268732497595112CORDEIRO, Gauss MoutinhoNOBRE, Juvêncio Santos2018-07-03T21:14:00Z2018-07-03T21:14:00Z2017-01-17https://repositorio.ufpe.br/handle/123456789/24966ark:/64986/001300000grn9We discuss the creation of new probability distributions for continuous data in two distinct approaches. The first one is, to our knowledge, novelty and consists of using Estimation of Distribution Algorithms (EDAs) to obtain new cumulative distribution functions. This class of algorithms work as follows. A population of solutions for a given problem is randomly selected from a space of candidates, which may contain candidates that are not feasible solutions to the problem. The selection occurs by following a set of probability rules that, initially, assign a uniform distribution to the space of candidates. Each individual is ranked by a fitness criterion. A fraction of the most fit individuals is selected and the probability rules are then adjusted to increase the likelihood of obtaining solutions similar to the most fit in the current population. The algorithm iterates until the set of probability rules are able to provide good solutions to the problem. In our proposal, the algorithm is used to generate cumulative distribution functions to model a given continuous data set. We tried to keep the mathematical expressions of the new functions as simple as possible. The results were satisfactory. We compared the models provided by the algorithm to the ones in already published papers. In every situation, the models proposed by the algorithms had advantages over the ones already published. The main advantage is the relative simplicity of the mathematical expressions obtained. Still in the context of computational tools and algorithms, we show the performance of simple neural networks as a method for parameter estimation in probability distributions. The motivation for this was the need to solve a large number of non linear equations when dealing with SAR images (SAR stands for synthetic aperture radar) in the statistical treatment of such images. The estimation process requires solving, iteratively, a non-linear equation. This is repeated for every pixel and an image usually consists of a large number of pixels. We trained a neural network to approximate an estimator for the parameter of interest. Once trained, the network can be fed the data and it will return an estimate of the parameter of interest without the need of iterative methods. The training of the network can take place even before collecting the data from the radar. The method was tested on simulated and real data sets with satisfactory results. The same method can be applied to different distributions. The second part of this thesis shows two new probability distribution classes obtained from the composition of already existing ones. In each situation, we present the new class and general results such as power series expansions for the probability density functions, expressions for the moments, entropy and alike. The first class is obtained from the composition of the beta-G and Lehmann-type II classes. The second class, from the transmuted-G and Marshall-Olkin-G classes. Distributions in these classes are compared to already existing ones as a way to illustrate the performance of applications to real data sets.FACEPEDiscutimos a criação de novas distribuições de probabilidade para dados contínuos em duas abordagens distintas. A primeira é, ao nosso conhecimento, inédita e consiste em utilizar algoritmos de estimação de distribuição para a obtenção de novas funções de distribuição acumulada. Algoritmos de estimação de distribuição funcionam da seguinte forma. Uma população de soluções para um determinado problema é extraída aleatoriamente de um conjunto que denominamos espaço de candidatos, o qual pode possuir candidatos que não são soluções viáveis para o problema. A extração ocorre de acordo com um conjunto de regras de probabilidade, as quais inicialmente atribuem uma distribuição uniforme ao espaço de candidatos. Cada indivíduo na população é classificado de acordo com um critério de desempenho. Uma porção dos indivíduos com melhor desempenho é escolhida e o conjunto de regras é adaptado para aumentar a probabilidade de obter soluções similares aos melhores indivíduos da população atual. O processo é repetido por um número de gerações até que a distribuição de probabilidade das soluções sorteadas forneça soluções boas o suficiente. Em nossa aplicação, o problema consiste em obter uma função de distribuição acumulada para um conjunto de dados contínuos qualquer. Tentamos, durante o processo, manter as expressões matemáticas das distribuições geradas as mais simples possíveis. Os resultados foram satisfatórios. Comparamos os modelos providos pelo algoritmo a modelos publicados em outros artigos. Em todas as situações, os modelos obtidos pelo algoritmo apresentaram vantagens sobre os modelos dos artigos publicados. A principal vantagem é a expressão matemática reduzida. Ainda no contexto do uso de ferramentas computacionais e algoritmos, mostramos como utilizar redes neurais simples para a estimação de parâmetros em distribuições de probabilidade. A motivação para tal aplicação foi a necessidade de resolver iterativamente um grande número de equações não lineares no tratamento estatístico de imagens obtidas de SARs (synthetic aperture radar). O processo de estimação requer a solução de uma equação por métodos iterativos e isso é repetido para cada pixel na imagem. Cada imagem possui um grande número de pixels, em geral. Pensando nisso, treinamos uma rede neural para aproximar o estimador para esse parâmetro. Uma vez treinada, a rede é alimentada com as janelas referente a cada pixel e retorna uma estimativa do parâmetro, sem a necessidade de métodos iterativos. O treino ocorre antes mesmo da obtenção dos dados do radar. O método foi testado em conjuntos de dados reais e fictícios com ótimos resultados. O mesmo método pode ser aplicado a outras distribuições. A segunda parte da tese exibe duas classes de distribuições de probabilidade obtidas a partir da composição de classes existentes. Em cada caso, apresentamos a nova classe e resultados gerais tais como expansões em série de potência para a função densidade de probabilidade, expressões para momentos, entropias e similares. A primeira classe é a composição das classes beta-G e Lehmann-tipo II. A segunda classe é obtida a partir das classes transmuted-G e Marshall-Olkin-G. Distribuições pertencentes a essas classes são comparadas a outras já existentes como maneira de ilustrar o desempenho em aplicações a dados reais.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em EstatisticaUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessEstatísticaProbabilidadeBuilding new probability distributions: the composition method and a computer based methodinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisdoutoradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPETHUMBNAILTESE Luis Gustavo Bastos Pinho.pdf.jpgTESE Luis Gustavo Bastos Pinho.pdf.jpgGenerated Thumbnailimage/jpeg1321https://repositorio.ufpe.br/bitstream/123456789/24966/5/TESE%20Luis%20Gustavo%20Bastos%20Pinho.pdf.jpg03b4a687d9eaca077238acb00afb2654MD55ORIGINALTESE Luis Gustavo Bastos Pinho.pdfTESE Luis Gustavo Bastos Pinho.pdfapplication/pdf3785410https://repositorio.ufpe.br/bitstream/123456789/24966/1/TESE%20Luis%20Gustavo%20Bastos%20Pinho.pdf4a1cf7340340bd8ff994a74abb62ba0eMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/24966/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82311https://repositorio.ufpe.br/bitstream/123456789/24966/3/license.txt4b8a02c7f2818eaf00dcf2260dd5eb08MD53TEXTTESE Luis Gustavo Bastos Pinho.pdf.txtTESE Luis Gustavo Bastos Pinho.pdf.txtExtracted texttext/plain153904https://repositorio.ufpe.br/bitstream/123456789/24966/4/TESE%20Luis%20Gustavo%20Bastos%20Pinho.pdf.txt4343b78ab6fcb7cbdc9338870f09e229MD54123456789/249662019-10-26 00:39:43.533oai:repositorio.ufpe.br:123456789/24966TGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKClRvZG8gZGVwb3NpdGFudGUgZGUgbWF0ZXJpYWwgbm8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgKFJJKSBkZXZlIGNvbmNlZGVyLCDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIChVRlBFKSwgdW1hIExpY2Vuw6dhIGRlIERpc3RyaWJ1acOnw6NvIE7Do28gRXhjbHVzaXZhIHBhcmEgbWFudGVyIGUgdG9ybmFyIGFjZXNzw612ZWlzIG9zIHNldXMgZG9jdW1lbnRvcywgZW0gZm9ybWF0byBkaWdpdGFsLCBuZXN0ZSByZXBvc2l0w7NyaW8uCgpDb20gYSBjb25jZXNzw6NvIGRlc3RhIGxpY2Vuw6dhIG7Do28gZXhjbHVzaXZhLCBvIGRlcG9zaXRhbnRlIG1hbnTDqW0gdG9kb3Mgb3MgZGlyZWl0b3MgZGUgYXV0b3IuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwoKTGljZW7Dp2EgZGUgRGlzdHJpYnVpw6fDo28gTsOjbyBFeGNsdXNpdmEKCkFvIGNvbmNvcmRhciBjb20gZXN0YSBsaWNlbsOnYSBlIGFjZWl0w6EtbGEsIHZvY8OqIChhdXRvciBvdSBkZXRlbnRvciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMpOgoKYSkgRGVjbGFyYSBxdWUgY29uaGVjZSBhIHBvbMOtdGljYSBkZSBjb3B5cmlnaHQgZGEgZWRpdG9yYSBkbyBzZXUgZG9jdW1lbnRvOwpiKSBEZWNsYXJhIHF1ZSBjb25oZWNlIGUgYWNlaXRhIGFzIERpcmV0cml6ZXMgcGFyYSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGUEU7CmMpIENvbmNlZGUgw6AgVUZQRSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZGUgYXJxdWl2YXIsIHJlcHJvZHV6aXIsIGNvbnZlcnRlciAoY29tbyBkZWZpbmlkbyBhIHNlZ3VpciksIGNvbXVuaWNhciBlL291IGRpc3RyaWJ1aXIsIG5vIFJJLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgcG9yIG91dHJvIG1laW87CmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgVUZQRSBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgcGFyYSBxdWFscXVlciBmb3JtYXRvIGRlIGZpY2hlaXJvLCBtZWlvIG91IHN1cG9ydGUsIHBhcmEgZWZlaXRvcyBkZSBzZWd1cmFuw6dhLCBwcmVzZXJ2YcOnw6NvIChiYWNrdXApIGUgYWNlc3NvOwplKSBEZWNsYXJhIHF1ZSBvIGRvY3VtZW50byBzdWJtZXRpZG8gw6kgbyBzZXUgdHJhYmFsaG8gb3JpZ2luYWwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2Ugb3MgZGlyZWl0b3MgZGUgb3V0cmEgcGVzc29hIG91IGVudGlkYWRlOwpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlCmF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIGlycmVzdHJpdGEgZG8gcmVzcGVjdGl2byBkZXRlbnRvciBkZXNzZXMgZGlyZWl0b3MgcGFyYSBjZWRlciDDoApVRlBFIG9zIGRpcmVpdG9zIHJlcXVlcmlkb3MgcG9yIGVzdGEgTGljZW7Dp2EgZSBhdXRvcml6YXIgYSB1bml2ZXJzaWRhZGUgYSB1dGlsaXrDoS1sb3MgbGVnYWxtZW50ZS4gRGVjbGFyYSB0YW1iw6ltIHF1ZSBlc3NlIG1hdGVyaWFsIGN1am9zIGRpcmVpdG9zIHPDo28gZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZTsKZykgU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8gcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVRlBFLMKgZGVjbGFyYSBxdWUgY3VtcHJpdSBxdWFpc3F1ZXIgb2JyaWdhw6fDtWVzIGV4aWdpZGFzIHBlbG8gcmVzcGVjdGl2byBjb250cmF0byBvdSBhY29yZG8uCgpBIFVGUEUgaWRlbnRpZmljYXLDoSBjbGFyYW1lbnRlIG8ocykgbm9tZShzKSBkbyhzKSBhdXRvciAoZXMpIGRvcyBkaXJlaXRvcyBkbyBkb2N1bWVudG8gZW50cmVndWUgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBwYXJhIGFsw6ltIGRvIHByZXZpc3RvIG5hIGFsw61uZWEgYykuCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212019-10-26T03:39:43Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv Building new probability distributions: the composition method and a computer based method
title Building new probability distributions: the composition method and a computer based method
spellingShingle Building new probability distributions: the composition method and a computer based method
PINHO, Luis Gustavo Bastos
Estatística
Probabilidade
title_short Building new probability distributions: the composition method and a computer based method
title_full Building new probability distributions: the composition method and a computer based method
title_fullStr Building new probability distributions: the composition method and a computer based method
title_full_unstemmed Building new probability distributions: the composition method and a computer based method
title_sort Building new probability distributions: the composition method and a computer based method
author PINHO, Luis Gustavo Bastos
author_facet PINHO, Luis Gustavo Bastos
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/7587599315362185
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/3268732497595112
dc.contributor.author.fl_str_mv PINHO, Luis Gustavo Bastos
dc.contributor.advisor1.fl_str_mv CORDEIRO, Gauss Moutinho
dc.contributor.advisor-co1.fl_str_mv NOBRE, Juvêncio Santos
contributor_str_mv CORDEIRO, Gauss Moutinho
NOBRE, Juvêncio Santos
dc.subject.por.fl_str_mv Estatística
Probabilidade
topic Estatística
Probabilidade
description We discuss the creation of new probability distributions for continuous data in two distinct approaches. The first one is, to our knowledge, novelty and consists of using Estimation of Distribution Algorithms (EDAs) to obtain new cumulative distribution functions. This class of algorithms work as follows. A population of solutions for a given problem is randomly selected from a space of candidates, which may contain candidates that are not feasible solutions to the problem. The selection occurs by following a set of probability rules that, initially, assign a uniform distribution to the space of candidates. Each individual is ranked by a fitness criterion. A fraction of the most fit individuals is selected and the probability rules are then adjusted to increase the likelihood of obtaining solutions similar to the most fit in the current population. The algorithm iterates until the set of probability rules are able to provide good solutions to the problem. In our proposal, the algorithm is used to generate cumulative distribution functions to model a given continuous data set. We tried to keep the mathematical expressions of the new functions as simple as possible. The results were satisfactory. We compared the models provided by the algorithm to the ones in already published papers. In every situation, the models proposed by the algorithms had advantages over the ones already published. The main advantage is the relative simplicity of the mathematical expressions obtained. Still in the context of computational tools and algorithms, we show the performance of simple neural networks as a method for parameter estimation in probability distributions. The motivation for this was the need to solve a large number of non linear equations when dealing with SAR images (SAR stands for synthetic aperture radar) in the statistical treatment of such images. The estimation process requires solving, iteratively, a non-linear equation. This is repeated for every pixel and an image usually consists of a large number of pixels. We trained a neural network to approximate an estimator for the parameter of interest. Once trained, the network can be fed the data and it will return an estimate of the parameter of interest without the need of iterative methods. The training of the network can take place even before collecting the data from the radar. The method was tested on simulated and real data sets with satisfactory results. The same method can be applied to different distributions. The second part of this thesis shows two new probability distribution classes obtained from the composition of already existing ones. In each situation, we present the new class and general results such as power series expansions for the probability density functions, expressions for the moments, entropy and alike. The first class is obtained from the composition of the beta-G and Lehmann-type II classes. The second class, from the transmuted-G and Marshall-Olkin-G classes. Distributions in these classes are compared to already existing ones as a way to illustrate the performance of applications to real data sets.
publishDate 2017
dc.date.issued.fl_str_mv 2017-01-17
dc.date.accessioned.fl_str_mv 2018-07-03T21:14:00Z
dc.date.available.fl_str_mv 2018-07-03T21:14:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio.ufpe.br/handle/123456789/24966
dc.identifier.dark.fl_str_mv ark:/64986/001300000grn9
url https://repositorio.ufpe.br/handle/123456789/24966
identifier_str_mv ark:/64986/001300000grn9
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv Programa de Pos Graduacao em Estatistica
dc.publisher.initials.fl_str_mv UFPE
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
bitstream.url.fl_str_mv https://repositorio.ufpe.br/bitstream/123456789/24966/5/TESE%20Luis%20Gustavo%20Bastos%20Pinho.pdf.jpg
https://repositorio.ufpe.br/bitstream/123456789/24966/1/TESE%20Luis%20Gustavo%20Bastos%20Pinho.pdf
https://repositorio.ufpe.br/bitstream/123456789/24966/2/license_rdf
https://repositorio.ufpe.br/bitstream/123456789/24966/3/license.txt
https://repositorio.ufpe.br/bitstream/123456789/24966/4/TESE%20Luis%20Gustavo%20Bastos%20Pinho.pdf.txt
bitstream.checksum.fl_str_mv 03b4a687d9eaca077238acb00afb2654
4a1cf7340340bd8ff994a74abb62ba0e
e39d27027a6cc9cb039ad269a5db8e34
4b8a02c7f2818eaf00dcf2260dd5eb08
4343b78ab6fcb7cbdc9338870f09e229
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1815172821101314048