Data imputation analysis for Cosmic Rays time series

Detalhes bibliográficos
Autor(a) principal: Fernandes, Ronabson Cardoso
Data de Publicação: 2017
Outros Autores: Lúcio, Paulo Sérgio, Fernandez, José Henrique
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFRN
Texto Completo: https://repositorio.ufrn.br/jspui/handle/123456789/29847
Resumo: he occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II – runs the bootstrap Expectation Maximization algorithm, MICE – runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI – an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.
id UFRN_f2c930894bf420b3992dc1b7cdc05bfb
oai_identifier_str oai:https://repositorio.ufrn.br:123456789/29847
network_acronym_str UFRN
network_name_str Repositório Institucional da UFRN
repository_id_str
spelling Fernandes, Ronabson CardosoLúcio, Paulo SérgioFernandez, José Henrique2020-08-18T15:01:41Z2020-08-18T15:01:41Z2017FERNANDES, R.C.; LUCIO, P.S.; FERNANDEZ, J.H.. Data imputation analysis for cosmic rays time series. Advances in Space Research, [s.l.], v. 59, n. 9, p. 2442-2457, maio 2017. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0273117717301199?via%3Dihub. Acesso em: 14 ago. 2020. http://dx.doi.org/10.1016/j.asr.2017.02.022.0273-1177https://repositorio.ufrn.br/jspui/handle/123456789/2984710.1016/j.asr.2017.02.022.ElsevierAttribution 3.0 Brazilhttp://creativecommons.org/licenses/by/3.0/br/info:eu-repo/semantics/openAccessBootstrapExpectation maximizationSkillMultivariateChained equationsData imputation analysis for Cosmic Rays time seriesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehe occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II – runs the bootstrap Expectation Maximization algorithm, MICE – runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI – an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.engreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8914https://repositorio.ufrn.br/bitstream/123456789/29847/2/license_rdf4d2950bda3d176f570a9f8b328dfbbefMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81569https://repositorio.ufrn.br/bitstream/123456789/29847/3/license.txt6e6f57145bc87daf99079f06b081ff9fMD53TEXTDataImputationAnalysis_FERNANDEZ_2017.pdf.txtDataImputationAnalysis_FERNANDEZ_2017.pdf.txtExtracted texttext/plain30476https://repositorio.ufrn.br/bitstream/123456789/29847/4/DataImputationAnalysis_FERNANDEZ_2017.pdf.txt91d85b676f3ad46c6b7d18aa42988645MD54THUMBNAILDataImputationAnalysis_FERNANDEZ_2017.pdf.jpgDataImputationAnalysis_FERNANDEZ_2017.pdf.jpgGenerated Thumbnailimage/jpeg1714https://repositorio.ufrn.br/bitstream/123456789/29847/5/DataImputationAnalysis_FERNANDEZ_2017.pdf.jpgb1e9ac32b401c7dadaa63c692ef350d3MD55123456789/298472022-12-06 18:12:23.667oai:https://repositorio.ufrn.br:123456789/29847TElDRU7Dh0HCoERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgoKQW8gYXNzaW5hciBlIGVudHJlZ2FyIGVzdGHCoGxpY2Vuw6dhLCBvL2EgU3IuL1NyYS4gKGF1dG9yIG91IGRldGVudG9yIGRvcyBkaXJlaXRvcyBkZSBhdXRvcik6CgoKYSkgQ29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZQpyZXByb2R1emlyLCBjb252ZXJ0ZXIgKGNvbW8gZGVmaW5pZG8gYWJhaXhvKSwgY29tdW5pY2FyIGUvb3UKZGlzdHJpYnVpciBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbQpmb3JtYXRvIGRpZ2l0YWwgb3UgaW1wcmVzc28gZSBlbSBxdWFscXVlciBtZWlvLgoKYikgRGVjbGFyYSBxdWUgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgc2V1IHRyYWJhbGhvIG9yaWdpbmFsLCBlIHF1ZQpkZXTDqW0gbyBkaXJlaXRvIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhwqBsaWNlbsOnYS4gRGVjbGFyYQp0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2UsIHRhbnRvIHF1YW50byBsaGUgw6kKcG9zc8OtdmVsIHNhYmVyLCBvcyBkaXJlaXRvcyBkZSBxdWFscXVlciBvdXRyYSBwZXNzb2Egb3UgZW50aWRhZGUuCgpjKSBTZSBvIGRvY3VtZW50byBlbnRyZWd1ZSBjb250w6ltIG1hdGVyaWFsIGRvIHF1YWwgbsOjbyBkZXTDqW0gb3MKZGlyZWl0b3MgZGUgYXV0b3IsIGRlY2xhcmEgcXVlIG9idGV2ZSBhdXRvcml6YcOnw6NvIGRvIGRldGVudG9yIGRvcwpkaXJlaXRvcyBkZSBhdXRvciBwYXJhIGNvbmNlZGVyIMOgIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRvIFJpbyBHcmFuZGUgZG8gTm9ydGUgb3MgZGlyZWl0b3MgcmVxdWVyaWRvcyBwb3IgZXN0YcKgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgY3Vqb3MgZGlyZWl0b3Mgc8OjbyBkZQp0ZXJjZWlyb3MgZXN0w6EgY2xhcmFtZW50ZSBpZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdQpjb250ZcO6ZG8gZG8gZG9jdW1lbnRvIGVudHJlZ3VlLgoKU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8KcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJhIHF1ZSBjdW1wcml1IHF1YWlzcXVlciBvYnJpZ2HDp8O1ZXMgZXhpZ2lkYXMgcGVsbyByZXNwZWN0aXZvIGNvbnRyYXRvIG91IGFjb3Jkby4KCkEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSBpZGVudGlmaWNhcsOhIGNsYXJhbWVudGUgbyhzKSBzZXUgKHMpIG5vbWUocykgY29tbyBvIChzKSBhdXRvciAoZXMpIG91IGRldGVudG9yIChlcykgZG9zIGRpcmVpdG9zIGRvIGRvY3VtZW50bwplbnRyZWd1ZSwgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBwYXJhIGFsw6ltIGRhcyBwZXJtaXRpZGFzIHBvcgplc3RhwqBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2022-12-06T21:12:23Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false
dc.title.pt_BR.fl_str_mv Data imputation analysis for Cosmic Rays time series
title Data imputation analysis for Cosmic Rays time series
spellingShingle Data imputation analysis for Cosmic Rays time series
Fernandes, Ronabson Cardoso
Bootstrap
Expectation maximization
Skill
Multivariate
Chained equations
title_short Data imputation analysis for Cosmic Rays time series
title_full Data imputation analysis for Cosmic Rays time series
title_fullStr Data imputation analysis for Cosmic Rays time series
title_full_unstemmed Data imputation analysis for Cosmic Rays time series
title_sort Data imputation analysis for Cosmic Rays time series
author Fernandes, Ronabson Cardoso
author_facet Fernandes, Ronabson Cardoso
Lúcio, Paulo Sérgio
Fernandez, José Henrique
author_role author
author2 Lúcio, Paulo Sérgio
Fernandez, José Henrique
author2_role author
author
dc.contributor.author.fl_str_mv Fernandes, Ronabson Cardoso
Lúcio, Paulo Sérgio
Fernandez, José Henrique
dc.subject.por.fl_str_mv Bootstrap
Expectation maximization
Skill
Multivariate
Chained equations
topic Bootstrap
Expectation maximization
Skill
Multivariate
Chained equations
description he occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II – runs the bootstrap Expectation Maximization algorithm, MICE – runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI – an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.
publishDate 2017
dc.date.issued.fl_str_mv 2017
dc.date.accessioned.fl_str_mv 2020-08-18T15:01:41Z
dc.date.available.fl_str_mv 2020-08-18T15:01:41Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.citation.fl_str_mv FERNANDES, R.C.; LUCIO, P.S.; FERNANDEZ, J.H.. Data imputation analysis for cosmic rays time series. Advances in Space Research, [s.l.], v. 59, n. 9, p. 2442-2457, maio 2017. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0273117717301199?via%3Dihub. Acesso em: 14 ago. 2020. http://dx.doi.org/10.1016/j.asr.2017.02.022.
dc.identifier.uri.fl_str_mv https://repositorio.ufrn.br/jspui/handle/123456789/29847
dc.identifier.issn.none.fl_str_mv 0273-1177
dc.identifier.doi.none.fl_str_mv 10.1016/j.asr.2017.02.022.
identifier_str_mv FERNANDES, R.C.; LUCIO, P.S.; FERNANDEZ, J.H.. Data imputation analysis for cosmic rays time series. Advances in Space Research, [s.l.], v. 59, n. 9, p. 2442-2457, maio 2017. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0273117717301199?via%3Dihub. Acesso em: 14 ago. 2020. http://dx.doi.org/10.1016/j.asr.2017.02.022.
0273-1177
10.1016/j.asr.2017.02.022.
url https://repositorio.ufrn.br/jspui/handle/123456789/29847
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution 3.0 Brazil
http://creativecommons.org/licenses/by/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution 3.0 Brazil
http://creativecommons.org/licenses/by/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFRN
instname:Universidade Federal do Rio Grande do Norte (UFRN)
instacron:UFRN
instname_str Universidade Federal do Rio Grande do Norte (UFRN)
instacron_str UFRN
institution UFRN
reponame_str Repositório Institucional da UFRN
collection Repositório Institucional da UFRN
bitstream.url.fl_str_mv https://repositorio.ufrn.br/bitstream/123456789/29847/2/license_rdf
https://repositorio.ufrn.br/bitstream/123456789/29847/3/license.txt
https://repositorio.ufrn.br/bitstream/123456789/29847/4/DataImputationAnalysis_FERNANDEZ_2017.pdf.txt
https://repositorio.ufrn.br/bitstream/123456789/29847/5/DataImputationAnalysis_FERNANDEZ_2017.pdf.jpg
bitstream.checksum.fl_str_mv 4d2950bda3d176f570a9f8b328dfbbef
6e6f57145bc87daf99079f06b081ff9f
91d85b676f3ad46c6b7d18aa42988645
b1e9ac32b401c7dadaa63c692ef350d3
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)
repository.mail.fl_str_mv
_version_ 1814832704159481856