A Novel Nonparametric Distance Estimator for Densities with Error Bounds

Detalhes bibliográficos
Autor(a) principal: Alexandre R. F. Carvalho
Data de Publicação: 2013
Outros Autores: João Manuel R. S. Tavares, José C. Principe
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://repositorio-aberto.up.pt/handle/10216/66052
Resumo: The use of a metric to assess distance between probability densities is an important practical problem. In this work, a particular metric induced by an a-divergence is studied. The Hellinger metric can be interpreted as a particular case within the framework of generalized Tsallis divergences and entropies. The nonparametric Parzen's density estimator emerges as a natural candidate to estimate the underlying probability density function, since it may account for data from different groups, or experiments with distinct instrumental precisions, i.e., non-independent and identically distributed (non-i.i.d.) data. However, the information theoretic derived metric of the nonparametric Parzen's density estimator displays infinite variance, limiting the direct use of resampling estimators. Based on measure theory, we present a change of measure to build a finite variance density allowing the use of resampling estimators. In order to counteract the poor scaling with dimension, we propose a new nonparametric two-stage robust resampling estimator of Hellinger's metric error bounds for heterocedastic data. The approach presents very promising results allowing the use of different covariances for different clusters with impact on the distance evaluation.
id RCAP_5988a05cdafb282f4f2b6d797bbe5ce6
oai_identifier_str oai:repositorio-aberto.up.pt:10216/66052
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A Novel Nonparametric Distance Estimator for Densities with Error BoundsCiências Tecnológicas, Ciências da engenharia e tecnologiasTechnological sciences, Engineering and technologyThe use of a metric to assess distance between probability densities is an important practical problem. In this work, a particular metric induced by an a-divergence is studied. The Hellinger metric can be interpreted as a particular case within the framework of generalized Tsallis divergences and entropies. The nonparametric Parzen's density estimator emerges as a natural candidate to estimate the underlying probability density function, since it may account for data from different groups, or experiments with distinct instrumental precisions, i.e., non-independent and identically distributed (non-i.i.d.) data. However, the information theoretic derived metric of the nonparametric Parzen's density estimator displays infinite variance, limiting the direct use of resampling estimators. Based on measure theory, we present a change of measure to build a finite variance density allowing the use of resampling estimators. In order to counteract the poor scaling with dimension, we propose a new nonparametric two-stage robust resampling estimator of Hellinger's metric error bounds for heterocedastic data. The approach presents very promising results allowing the use of different covariances for different clusters with impact on the distance evaluation.20132013-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://repositorio-aberto.up.pt/handle/10216/66052eng1099-430010.3390/e15051609Alexandre R. F. CarvalhoJoão Manuel R. S. TavaresJosé C. Principeinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:59:29Zoai:repositorio-aberto.up.pt:10216/66052Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:13:08.583645Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A Novel Nonparametric Distance Estimator for Densities with Error Bounds
title A Novel Nonparametric Distance Estimator for Densities with Error Bounds
spellingShingle A Novel Nonparametric Distance Estimator for Densities with Error Bounds
Alexandre R. F. Carvalho
Ciências Tecnológicas, Ciências da engenharia e tecnologias
Technological sciences, Engineering and technology
title_short A Novel Nonparametric Distance Estimator for Densities with Error Bounds
title_full A Novel Nonparametric Distance Estimator for Densities with Error Bounds
title_fullStr A Novel Nonparametric Distance Estimator for Densities with Error Bounds
title_full_unstemmed A Novel Nonparametric Distance Estimator for Densities with Error Bounds
title_sort A Novel Nonparametric Distance Estimator for Densities with Error Bounds
author Alexandre R. F. Carvalho
author_facet Alexandre R. F. Carvalho
João Manuel R. S. Tavares
José C. Principe
author_role author
author2 João Manuel R. S. Tavares
José C. Principe
author2_role author
author
dc.contributor.author.fl_str_mv Alexandre R. F. Carvalho
João Manuel R. S. Tavares
José C. Principe
dc.subject.por.fl_str_mv Ciências Tecnológicas, Ciências da engenharia e tecnologias
Technological sciences, Engineering and technology
topic Ciências Tecnológicas, Ciências da engenharia e tecnologias
Technological sciences, Engineering and technology
description The use of a metric to assess distance between probability densities is an important practical problem. In this work, a particular metric induced by an a-divergence is studied. The Hellinger metric can be interpreted as a particular case within the framework of generalized Tsallis divergences and entropies. The nonparametric Parzen's density estimator emerges as a natural candidate to estimate the underlying probability density function, since it may account for data from different groups, or experiments with distinct instrumental precisions, i.e., non-independent and identically distributed (non-i.i.d.) data. However, the information theoretic derived metric of the nonparametric Parzen's density estimator displays infinite variance, limiting the direct use of resampling estimators. Based on measure theory, we present a change of measure to build a finite variance density allowing the use of resampling estimators. In order to counteract the poor scaling with dimension, we propose a new nonparametric two-stage robust resampling estimator of Hellinger's metric error bounds for heterocedastic data. The approach presents very promising results allowing the use of different covariances for different clusters with impact on the distance evaluation.
publishDate 2013
dc.date.none.fl_str_mv 2013
2013-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://repositorio-aberto.up.pt/handle/10216/66052
url https://repositorio-aberto.up.pt/handle/10216/66052
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1099-4300
10.3390/e15051609
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136053583413248