Sparse Markov models for high-dimensional inference

Ost, Guilherme; Takahashi, Daniel Yasumasa

Sparse Markov models for high-dimensional inference

Detalhes bibliográficos
Autor(a) principal:	Ost, Guilherme
Data de Publicação:	2023
Outros Autores:	Takahashi, Daniel Yasumasa
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UFRN
Texto Completo:	https://repositorio.ufrn.br/handle/123456789/55432
Resumo:	Finite-order Markov models are well-studied models for dependent finite alphabet data. Despite their generality, application in empirical work is rare when the order d is large relative to the sample size n (e.g., d=O(n)). Practitioners rarely use higher-order Markov models because (1) the number of parameters grows exponentially with the order, (2) the sample size n required to estimate each parameter grows exponentially with the order, and (3) the interpretation is often difficult. Here, we consider a subclass of Markov models called Mixture of Transition Distribution (MTD) models, proving that when the set of relevant lags is sparse (i.e., O(log(n))), we can consistently and efficiently recover the lags and estimate the transition probabilities of high-dimensional (d=O(n)) MTD models. Moreover, the estimated model allows straightforward interpretation. The key innovation is a recursive procedure for a priori selection of the relevant lags of the model. We prove a new structural result for the MTD and an improved martingale concentration inequality to prove our results. Using simulations, we show that our method performs well compared to other relevant methods. We also illustrate the usefulness of our method on weather data where the proposed method correctly recovers the long-range dependence

Metadados do item

id	UFRN_983441d5722c7fed601f5ee550ad8edf
oai_identifier_str	oai:https://repositorio.ufrn.br:123456789/55432
network_acronym_str	UFRN
network_name_str	Repositório Institucional da UFRN
repository_id_str
spelling	Ost, GuilhermeTakahashi, Daniel Yasumasa2023-11-24T16:06:54Z2023-11-24T16:06:54Z2023-08OST, Guilherme; TAKAHASHI, Daniel Y. Sparse Markov Models for High-dimensional Inference. Journal of Machine Learning Research, [S. l.], v. 24, n. 279, p. 1−54, 2023. Disponível em: https://www.jmlr.org/papers/v24/22-0266.html. Acesso em: 22 nov. 20231533-7928https://repositorio.ufrn.br/handle/123456789/55432Attribution 3.0 Brazilhttp://creativecommons.org/licenses/by/3.0/br/info:eu-repo/semantics/openAccessMarkov chainsHigh-dimensional inferenceMixture transition distributionSparse Markov models for high-dimensional inferenceinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleFinite-order Markov models are well-studied models for dependent finite alphabet data. Despite their generality, application in empirical work is rare when the order d is large relative to the sample size n (e.g., d=O(n)). Practitioners rarely use higher-order Markov models because (1) the number of parameters grows exponentially with the order, (2) the sample size n required to estimate each parameter grows exponentially with the order, and (3) the interpretation is often difficult. Here, we consider a subclass of Markov models called Mixture of Transition Distribution (MTD) models, proving that when the set of relevant lags is sparse (i.e., O(log(n))), we can consistently and efficiently recover the lags and estimate the transition probabilities of high-dimensional (d=O(n)) MTD models. Moreover, the estimated model allows straightforward interpretation. The key innovation is a recursive procedure for a priori selection of the relevant lags of the model. We prove a new structural result for the MTD and an improved martingale concentration inequality to prove our results. Using simulations, we show that our method performs well compared to other relevant methods. We also illustrate the usefulness of our method on weather data where the proposed method correctly recovers the long-range dependenceengreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNORIGINALSparseMarkovModels_Takahashi_2023.pdfSparseMarkovModels_Takahashi_2023.pdfSparseMarkovModels_Takahashi_2023application/pdf505239https://repositorio.ufrn.br/bitstream/123456789/55432/1/SparseMarkovModels_Takahashi_2023.pdfc9ca1e46b0e55e4964f65ef28eef3919MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8914https://repositorio.ufrn.br/bitstream/123456789/55432/2/license_rdf4d2950bda3d176f570a9f8b328dfbbefMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81484https://repositorio.ufrn.br/bitstream/123456789/55432/3/license.txte9597aa2854d128fd968be5edc8a28d9MD53123456789/554322023-11-24 13:06:55.051oai:https://repositorio.ufrn.br:123456789/55432Tk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5TRQoKCkJ5IHNpZ25pbmcgYW5kIGRlbGl2ZXJpbmcgdGhpcyBsaWNlbnNlLCBNci4gKGF1dGhvciBvciBjb3B5cmlnaHQgaG9sZGVyKToKCgphKSBHcmFudHMgdGhlIFVuaXZlcnNpZGFkZSBGZWRlcmFsIFJpbyBHcmFuZGUgZG8gTm9ydGUgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgb2YKcmVwcm9kdWNlLCBjb252ZXJ0IChhcyBkZWZpbmVkIGJlbG93KSwgY29tbXVuaWNhdGUgYW5kIC8gb3IKZGlzdHJpYnV0ZSB0aGUgZGVsaXZlcmVkIGRvY3VtZW50IChpbmNsdWRpbmcgYWJzdHJhY3QgLyBhYnN0cmFjdCkgaW4KZGlnaXRhbCBvciBwcmludGVkIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bS4KCmIpIERlY2xhcmVzIHRoYXQgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBpdHMgb3JpZ2luYWwgd29yaywgYW5kIHRoYXQKeW91IGhhdmUgdGhlIHJpZ2h0IHRvIGdyYW50IHRoZSByaWdodHMgY29udGFpbmVkIGluIHRoaXMgbGljZW5zZS4gRGVjbGFyZXMKdGhhdCB0aGUgZGVsaXZlcnkgb2YgdGhlIGRvY3VtZW50IGRvZXMgbm90IGluZnJpbmdlLCBhcyBmYXIgYXMgaXQgaXMKdGhlIHJpZ2h0cyBvZiBhbnkgb3RoZXIgcGVyc29uIG9yIGVudGl0eS4KCmMpIElmIHRoZSBkb2N1bWVudCBkZWxpdmVyZWQgY29udGFpbnMgbWF0ZXJpYWwgd2hpY2ggZG9lcyBub3QKcmlnaHRzLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBvYnRhaW5lZCBhdXRob3JpemF0aW9uIGZyb20gdGhlIGhvbGRlciBvZiB0aGUKY29weXJpZ2h0IHRvIGdyYW50IHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdCB0aGlzIG1hdGVyaWFsIHdob3NlIHJpZ2h0cyBhcmUgb2YKdGhpcmQgcGFydGllcyBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIHJlY29nbml6ZWQgaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgZG9jdW1lbnQgZGVsaXZlcmVkLgoKSWYgdGhlIGRvY3VtZW50IHN1Ym1pdHRlZCBpcyBiYXNlZCBvbiBmdW5kZWQgb3Igc3VwcG9ydGVkIHdvcmsKYnkgYW5vdGhlciBpbnN0aXR1dGlvbiBvdGhlciB0aGFuIHRoZSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJlcyB0aGF0IGl0IGhhcyBmdWxmaWxsZWQgYW55IG9ibGlnYXRpb25zIHJlcXVpcmVkIGJ5IHRoZSByZXNwZWN0aXZlIGFncmVlbWVudCBvciBhZ3JlZW1lbnQuCgpUaGUgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSB3aWxsIGNsZWFybHkgaWRlbnRpZnkgaXRzIG5hbWUgKHMpIGFzIHRoZSBhdXRob3IgKHMpIG9yIGhvbGRlciAocykgb2YgdGhlIGRvY3VtZW50J3MgcmlnaHRzCmRlbGl2ZXJlZCwgYW5kIHdpbGwgbm90IG1ha2UgYW55IGNoYW5nZXMsIG90aGVyIHRoYW4gdGhvc2UgcGVybWl0dGVkIGJ5CnRoaXMgbGljZW5zZQo=Repositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2023-11-24T16:06:55Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false
dc.title.pt_BR.fl_str_mv	Sparse Markov models for high-dimensional inference
title	Sparse Markov models for high-dimensional inference
spellingShingle	Sparse Markov models for high-dimensional inference Ost, Guilherme Markov chains High-dimensional inference Mixture transition distribution
title_short	Sparse Markov models for high-dimensional inference
title_full	Sparse Markov models for high-dimensional inference
title_fullStr	Sparse Markov models for high-dimensional inference
title_full_unstemmed	Sparse Markov models for high-dimensional inference
title_sort	Sparse Markov models for high-dimensional inference
author	Ost, Guilherme
author_facet	Ost, Guilherme Takahashi, Daniel Yasumasa
author_role	author
author2	Takahashi, Daniel Yasumasa
author2_role	author
dc.contributor.author.fl_str_mv	Ost, Guilherme Takahashi, Daniel Yasumasa
dc.subject.por.fl_str_mv	Markov chains High-dimensional inference Mixture transition distribution
topic	Markov chains High-dimensional inference Mixture transition distribution
description	Finite-order Markov models are well-studied models for dependent finite alphabet data. Despite their generality, application in empirical work is rare when the order d is large relative to the sample size n (e.g., d=O(n)). Practitioners rarely use higher-order Markov models because (1) the number of parameters grows exponentially with the order, (2) the sample size n required to estimate each parameter grows exponentially with the order, and (3) the interpretation is often difficult. Here, we consider a subclass of Markov models called Mixture of Transition Distribution (MTD) models, proving that when the set of relevant lags is sparse (i.e., O(log(n))), we can consistently and efficiently recover the lags and estimate the transition probabilities of high-dimensional (d=O(n)) MTD models. Moreover, the estimated model allows straightforward interpretation. The key innovation is a recursive procedure for a priori selection of the relevant lags of the model. We prove a new structural result for the MTD and an improved martingale concentration inequality to prove our results. Using simulations, we show that our method performs well compared to other relevant methods. We also illustrate the usefulness of our method on weather data where the proposed method correctly recovers the long-range dependence
publishDate	2023
dc.date.accessioned.fl_str_mv	2023-11-24T16:06:54Z
dc.date.available.fl_str_mv	2023-11-24T16:06:54Z
dc.date.issued.fl_str_mv	2023-08
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	OST, Guilherme; TAKAHASHI, Daniel Y. Sparse Markov Models for High-dimensional Inference. Journal of Machine Learning Research, [S. l.], v. 24, n. 279, p. 1−54, 2023. Disponível em: https://www.jmlr.org/papers/v24/22-0266.html. Acesso em: 22 nov. 2023
dc.identifier.uri.fl_str_mv	https://repositorio.ufrn.br/handle/123456789/55432
dc.identifier.issn.none.fl_str_mv	1533-7928
identifier_str_mv	OST, Guilherme; TAKAHASHI, Daniel Y. Sparse Markov Models for High-dimensional Inference. Journal of Machine Learning Research, [S. l.], v. 24, n. 279, p. 1−54, 2023. Disponível em: https://www.jmlr.org/papers/v24/22-0266.html. Acesso em: 22 nov. 2023 1533-7928
url	https://repositorio.ufrn.br/handle/123456789/55432
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	Attribution 3.0 Brazil http://creativecommons.org/licenses/by/3.0/br/ info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Attribution 3.0 Brazil http://creativecommons.org/licenses/by/3.0/br/
eu_rights_str_mv	openAccess
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFRN instname:Universidade Federal do Rio Grande do Norte (UFRN) instacron:UFRN
instname_str	Universidade Federal do Rio Grande do Norte (UFRN)
instacron_str	UFRN
institution	UFRN
reponame_str	Repositório Institucional da UFRN
collection	Repositório Institucional da UFRN
bitstream.url.fl_str_mv	https://repositorio.ufrn.br/bitstream/123456789/55432/1/SparseMarkovModels_Takahashi_2023.pdf https://repositorio.ufrn.br/bitstream/123456789/55432/2/license_rdf https://repositorio.ufrn.br/bitstream/123456789/55432/3/license.txt
bitstream.checksum.fl_str_mv	c9ca1e46b0e55e4964f65ef28eef3919 4d2950bda3d176f570a9f8b328dfbbef e9597aa2854d128fd968be5edc8a28d9
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)
repository.mail.fl_str_mv
_version_	1802117836307431424

Sparse Markov models for high-dimensional inference

Registros relacionados