Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFMG |
Texto Completo: | http://hdl.handle.net/1843/46099 https://orcid.org/0000-0002-6754-4374 |
Resumo: | Dealing with missingness in time series data is a very important, but oftentimes overlooked, step in data analysis. In this dissertation, the pattern of time series data and missingness mechanisms are described to help identify which imputation method should be used to impute missing data, along with a review of imputation methods and how they work. Recommended methods from literature are used to impute synthetic data of different pattern and the results are discussed. In this dissertation, two new methods to impute missing time steps are presented and compared to other classical imputation methods, as well as state-of-the-art methods. The first imputation method presented is Imputation by Pattern. This method is based on the premise that imputing the data using the literature- recommended methods will achieve the best results. Heuristics are proposed to separate the time series by pattern. The second imputation method presented is Imputation by Decomposition. This method consists in decomposing the time series in its components and then imputing them using the literature-recommended methods. The combination of these methods and the Kalman filter are also tested. The discussed imputation methods are used to impute a financial indexes and instability trackers data set, a COVID-19 data set and a deng data set and then predictions are made and the results are presented. The Imputation by Pattern method combined with the Kalman filter achieved consistently satisfactory results, although it did not always achieve the best results. The Imputation by Decomposition method achieved good results, specially when some time was spent investigating which variation worked better with each data set. Overall, both imputation method achieved similar, and in some cases, better results than the classical imputation methods. |
id |
UFMG_c52974862cd94c9d1220dbf9f2f7ee37 |
---|---|
oai_identifier_str |
oai:repositorio.ufmg.br:1843/46099 |
network_acronym_str |
UFMG |
network_name_str |
Repositório Institucional da UFMG |
repository_id_str |
|
spelling |
Cristiano Leite de Castrohttp://lattes.cnpq.br/7892966809901738Luis Antonio AguirreFrederico Gadelha Guimarãeshttp://lattes.cnpq.br/8320252058050644Silvana Mara Ribeiro2022-10-07T17:37:19Z2022-10-07T17:37:19Z2021-07-28http://hdl.handle.net/1843/46099https://orcid.org/0000-0002-6754-4374Dealing with missingness in time series data is a very important, but oftentimes overlooked, step in data analysis. In this dissertation, the pattern of time series data and missingness mechanisms are described to help identify which imputation method should be used to impute missing data, along with a review of imputation methods and how they work. Recommended methods from literature are used to impute synthetic data of different pattern and the results are discussed. In this dissertation, two new methods to impute missing time steps are presented and compared to other classical imputation methods, as well as state-of-the-art methods. The first imputation method presented is Imputation by Pattern. This method is based on the premise that imputing the data using the literature- recommended methods will achieve the best results. Heuristics are proposed to separate the time series by pattern. The second imputation method presented is Imputation by Decomposition. This method consists in decomposing the time series in its components and then imputing them using the literature-recommended methods. The combination of these methods and the Kalman filter are also tested. The discussed imputation methods are used to impute a financial indexes and instability trackers data set, a COVID-19 data set and a deng data set and then predictions are made and the results are presented. The Imputation by Pattern method combined with the Kalman filter achieved consistently satisfactory results, although it did not always achieve the best results. The Imputation by Decomposition method achieved good results, specially when some time was spent investigating which variation worked better with each data set. Overall, both imputation method achieved similar, and in some cases, better results than the classical imputation methods.Um passo importante, porém muitas vezes negligenciado, durante a análise de dados de séries temporais é a imputação de dados ausentes. Nessa dissertação, as características de séries temporais e mecanismos de perda são descritos para ajudar na identificação de qual método de imputação deve ser utilizado para imputar dados ausentes, juntamente com uma revisão bibliográfica de métodos de imputação e seu funcionamento. Os métodos de imputação recomendados pela literatura são utilizados para imputar dados sintéticos com diferentes características e os resultados são discutidos. Dois novos métodos de imputação de séries temporais são apresentados e comparados com métodos de imputação clássicos e métodos do estado-da-arte. O primeiro método de imputação apresentado é o de Imputação pelo Padrão. Esse método se baseia na premissa que utilizando-se o método de imputação recomendado pela literatura para cada padrão de série temporal se obterá os melhores resultados. Heurísticas de separação das séries temporais por padrão foram desenvolvidas. O segundo método apresentado é o de Imputação por Decomposição. Esse método consiste em decompor a série temporal e depois imputar cada um de seus componentes pelos métodos recomendados pela literatura. As combinações desses métodos e o filtro de Kalman também foram testados. Os métodos de imputação discutidos são utilizados para imputar dados de índices financeiros e rastreadores de instabilidade, dados sobre a COVID-19 e dados sobre a dengue. Predições são realizadas com os dados dos casos de estudo e os resultados são apresentados. Os resultados obtidos pelo método de Imputação por Padrão combinado com o filtro de Kalman são consistentemente satisfatórios, apesar de nem sempre obter os melhores resultados. O método de Imputação por Decomposição também obteve bons resultados, principalmente quando algum tempo foi gasto para investigar qual de suas variações se adequou melhor a cada conjunto de dados. No geral, ambos os métodos mostraram resultados similares e/ou melhores que os métodos de imputação clássicos.engUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em Engenharia ElétricaUFMGBrasilENG - DEPARTAMENTO DE ENGENHARIA ELÉTRICAhttp://creativecommons.org/licenses/by-nc-nd/3.0/pt/info:eu-repo/semantics/openAccessEngenharia elétricaAnálise de séries temporaisAusência de dados (Estatística)Ciências sociais - Métodos estatísticosMissing dataTime seriesImputation methodsDecompositionPatternImputation by decomposition and by time series nature : novel imputation methods for missing data in time seriesImputação por decomposição e pela natureza da série temporal : novos métodos de imputação para dados ausentes em séries temporaisinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALImputation by decomposition and by time series nature- novel imputation methods for missing data in time series.pdfImputation by decomposition and by time series nature- novel imputation methods for missing data in time series.pdfapplication/pdf6800395https://repositorio.ufmg.br/bitstream/1843/46099/1/Imputation%20by%20decomposition%20and%20by%20time%20series%20nature-%20novel%20imputation%20methods%20for%20missing%20data%20in%20time%20series.pdfbc7257b396996fe1bc85b5af10bfc960MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufmg.br/bitstream/1843/46099/2/license_rdfcfd6801dba008cb6adbd9838b81582abMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82118https://repositorio.ufmg.br/bitstream/1843/46099/3/license.txtcda590c95a0b51b4d15f60c9642ca272MD531843/460992022-10-07 14:37:20.157oai:repositorio.ufmg.br:1843/46099TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2022-10-07T17:37:20Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false |
dc.title.pt_BR.fl_str_mv |
Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series |
dc.title.alternative.pt_BR.fl_str_mv |
Imputação por decomposição e pela natureza da série temporal : novos métodos de imputação para dados ausentes em séries temporais |
title |
Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series |
spellingShingle |
Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series Silvana Mara Ribeiro Missing data Time series Imputation methods Decomposition Pattern Engenharia elétrica Análise de séries temporais Ausência de dados (Estatística) Ciências sociais - Métodos estatísticos |
title_short |
Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series |
title_full |
Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series |
title_fullStr |
Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series |
title_full_unstemmed |
Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series |
title_sort |
Imputation by decomposition and by time series nature : novel imputation methods for missing data in time series |
author |
Silvana Mara Ribeiro |
author_facet |
Silvana Mara Ribeiro |
author_role |
author |
dc.contributor.advisor1.fl_str_mv |
Cristiano Leite de Castro |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/7892966809901738 |
dc.contributor.referee1.fl_str_mv |
Luis Antonio Aguirre |
dc.contributor.referee2.fl_str_mv |
Frederico Gadelha Guimarães |
dc.contributor.authorLattes.fl_str_mv |
http://lattes.cnpq.br/8320252058050644 |
dc.contributor.author.fl_str_mv |
Silvana Mara Ribeiro |
contributor_str_mv |
Cristiano Leite de Castro Luis Antonio Aguirre Frederico Gadelha Guimarães |
dc.subject.por.fl_str_mv |
Missing data Time series Imputation methods Decomposition Pattern |
topic |
Missing data Time series Imputation methods Decomposition Pattern Engenharia elétrica Análise de séries temporais Ausência de dados (Estatística) Ciências sociais - Métodos estatísticos |
dc.subject.other.pt_BR.fl_str_mv |
Engenharia elétrica Análise de séries temporais Ausência de dados (Estatística) Ciências sociais - Métodos estatísticos |
description |
Dealing with missingness in time series data is a very important, but oftentimes overlooked, step in data analysis. In this dissertation, the pattern of time series data and missingness mechanisms are described to help identify which imputation method should be used to impute missing data, along with a review of imputation methods and how they work. Recommended methods from literature are used to impute synthetic data of different pattern and the results are discussed. In this dissertation, two new methods to impute missing time steps are presented and compared to other classical imputation methods, as well as state-of-the-art methods. The first imputation method presented is Imputation by Pattern. This method is based on the premise that imputing the data using the literature- recommended methods will achieve the best results. Heuristics are proposed to separate the time series by pattern. The second imputation method presented is Imputation by Decomposition. This method consists in decomposing the time series in its components and then imputing them using the literature-recommended methods. The combination of these methods and the Kalman filter are also tested. The discussed imputation methods are used to impute a financial indexes and instability trackers data set, a COVID-19 data set and a deng data set and then predictions are made and the results are presented. The Imputation by Pattern method combined with the Kalman filter achieved consistently satisfactory results, although it did not always achieve the best results. The Imputation by Decomposition method achieved good results, specially when some time was spent investigating which variation worked better with each data set. Overall, both imputation method achieved similar, and in some cases, better results than the classical imputation methods. |
publishDate |
2021 |
dc.date.issued.fl_str_mv |
2021-07-28 |
dc.date.accessioned.fl_str_mv |
2022-10-07T17:37:19Z |
dc.date.available.fl_str_mv |
2022-10-07T17:37:19Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1843/46099 |
dc.identifier.orcid.pt_BR.fl_str_mv |
https://orcid.org/0000-0002-6754-4374 |
url |
http://hdl.handle.net/1843/46099 https://orcid.org/0000-0002-6754-4374 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
http://creativecommons.org/licenses/by-nc-nd/3.0/pt/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-nd/3.0/pt/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Engenharia Elétrica |
dc.publisher.initials.fl_str_mv |
UFMG |
dc.publisher.country.fl_str_mv |
Brasil |
dc.publisher.department.fl_str_mv |
ENG - DEPARTAMENTO DE ENGENHARIA ELÉTRICA |
publisher.none.fl_str_mv |
Universidade Federal de Minas Gerais |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG |
instname_str |
Universidade Federal de Minas Gerais (UFMG) |
instacron_str |
UFMG |
institution |
UFMG |
reponame_str |
Repositório Institucional da UFMG |
collection |
Repositório Institucional da UFMG |
bitstream.url.fl_str_mv |
https://repositorio.ufmg.br/bitstream/1843/46099/1/Imputation%20by%20decomposition%20and%20by%20time%20series%20nature-%20novel%20imputation%20methods%20for%20missing%20data%20in%20time%20series.pdf https://repositorio.ufmg.br/bitstream/1843/46099/2/license_rdf https://repositorio.ufmg.br/bitstream/1843/46099/3/license.txt |
bitstream.checksum.fl_str_mv |
bc7257b396996fe1bc85b5af10bfc960 cfd6801dba008cb6adbd9838b81582ab cda590c95a0b51b4d15f60c9642ca272 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG) |
repository.mail.fl_str_mv |
|
_version_ |
1803589522727895040 |