Data imputation analysis for Cosmic Rays time series
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRN |
Texto Completo: | https://repositorio.ufrn.br/jspui/handle/123456789/29847 |
Resumo: | he occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II – runs the bootstrap Expectation Maximization algorithm, MICE – runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI – an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series. |
id |
UFRN_f2c930894bf420b3992dc1b7cdc05bfb |
---|---|
oai_identifier_str |
oai:https://repositorio.ufrn.br:123456789/29847 |
network_acronym_str |
UFRN |
network_name_str |
Repositório Institucional da UFRN |
repository_id_str |
|
spelling |
Fernandes, Ronabson CardosoLúcio, Paulo SérgioFernandez, José Henrique2020-08-18T15:01:41Z2020-08-18T15:01:41Z2017FERNANDES, R.C.; LUCIO, P.S.; FERNANDEZ, J.H.. Data imputation analysis for cosmic rays time series. Advances in Space Research, [s.l.], v. 59, n. 9, p. 2442-2457, maio 2017. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0273117717301199?via%3Dihub. Acesso em: 14 ago. 2020. http://dx.doi.org/10.1016/j.asr.2017.02.022.0273-1177https://repositorio.ufrn.br/jspui/handle/123456789/2984710.1016/j.asr.2017.02.022.ElsevierAttribution 3.0 Brazilhttp://creativecommons.org/licenses/by/3.0/br/info:eu-repo/semantics/openAccessBootstrapExpectation maximizationSkillMultivariateChained equationsData imputation analysis for Cosmic Rays time seriesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehe occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II – runs the bootstrap Expectation Maximization algorithm, MICE – runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI – an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.engreponame:Repositório Institucional da UFRNinstname:Universidade Federal do Rio Grande do Norte (UFRN)instacron:UFRNCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8914https://repositorio.ufrn.br/bitstream/123456789/29847/2/license_rdf4d2950bda3d176f570a9f8b328dfbbefMD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81569https://repositorio.ufrn.br/bitstream/123456789/29847/3/license.txt6e6f57145bc87daf99079f06b081ff9fMD53TEXTDataImputationAnalysis_FERNANDEZ_2017.pdf.txtDataImputationAnalysis_FERNANDEZ_2017.pdf.txtExtracted texttext/plain30476https://repositorio.ufrn.br/bitstream/123456789/29847/4/DataImputationAnalysis_FERNANDEZ_2017.pdf.txt91d85b676f3ad46c6b7d18aa42988645MD54THUMBNAILDataImputationAnalysis_FERNANDEZ_2017.pdf.jpgDataImputationAnalysis_FERNANDEZ_2017.pdf.jpgGenerated Thumbnailimage/jpeg1714https://repositorio.ufrn.br/bitstream/123456789/29847/5/DataImputationAnalysis_FERNANDEZ_2017.pdf.jpgb1e9ac32b401c7dadaa63c692ef350d3MD55123456789/298472022-12-06 18:12:23.667oai:https://repositorio.ufrn.br:123456789/29847TElDRU7Dh0HCoERFIERJU1RSSUJVScOHw4NPIE7Dg08tRVhDTFVTSVZBCgoKQW8gYXNzaW5hciBlIGVudHJlZ2FyIGVzdGHCoGxpY2Vuw6dhLCBvL2EgU3IuL1NyYS4gKGF1dG9yIG91IGRldGVudG9yIGRvcyBkaXJlaXRvcyBkZSBhdXRvcik6CgoKYSkgQ29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZQpyZXByb2R1emlyLCBjb252ZXJ0ZXIgKGNvbW8gZGVmaW5pZG8gYWJhaXhvKSwgY29tdW5pY2FyIGUvb3UKZGlzdHJpYnVpciBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbQpmb3JtYXRvIGRpZ2l0YWwgb3UgaW1wcmVzc28gZSBlbSBxdWFscXVlciBtZWlvLgoKYikgRGVjbGFyYSBxdWUgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgc2V1IHRyYWJhbGhvIG9yaWdpbmFsLCBlIHF1ZQpkZXTDqW0gbyBkaXJlaXRvIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhwqBsaWNlbsOnYS4gRGVjbGFyYQp0YW1iw6ltIHF1ZSBhIGVudHJlZ2EgZG8gZG9jdW1lbnRvIG7Do28gaW5mcmluZ2UsIHRhbnRvIHF1YW50byBsaGUgw6kKcG9zc8OtdmVsIHNhYmVyLCBvcyBkaXJlaXRvcyBkZSBxdWFscXVlciBvdXRyYSBwZXNzb2Egb3UgZW50aWRhZGUuCgpjKSBTZSBvIGRvY3VtZW50byBlbnRyZWd1ZSBjb250w6ltIG1hdGVyaWFsIGRvIHF1YWwgbsOjbyBkZXTDqW0gb3MKZGlyZWl0b3MgZGUgYXV0b3IsIGRlY2xhcmEgcXVlIG9idGV2ZSBhdXRvcml6YcOnw6NvIGRvIGRldGVudG9yIGRvcwpkaXJlaXRvcyBkZSBhdXRvciBwYXJhIGNvbmNlZGVyIMOgIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRvIFJpbyBHcmFuZGUgZG8gTm9ydGUgb3MgZGlyZWl0b3MgcmVxdWVyaWRvcyBwb3IgZXN0YcKgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgY3Vqb3MgZGlyZWl0b3Mgc8OjbyBkZQp0ZXJjZWlyb3MgZXN0w6EgY2xhcmFtZW50ZSBpZGVudGlmaWNhZG8gZSByZWNvbmhlY2lkbyBubyB0ZXh0byBvdQpjb250ZcO6ZG8gZG8gZG9jdW1lbnRvIGVudHJlZ3VlLgoKU2UgbyBkb2N1bWVudG8gZW50cmVndWUgw6kgYmFzZWFkbyBlbSB0cmFiYWxobyBmaW5hbmNpYWRvIG91IGFwb2lhZG8KcG9yIG91dHJhIGluc3RpdHVpw6fDo28gcXVlIG7Do28gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkbyBSaW8gR3JhbmRlIGRvIE5vcnRlLCBkZWNsYXJhIHF1ZSBjdW1wcml1IHF1YWlzcXVlciBvYnJpZ2HDp8O1ZXMgZXhpZ2lkYXMgcGVsbyByZXNwZWN0aXZvIGNvbnRyYXRvIG91IGFjb3Jkby4KCkEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZG8gUmlvIEdyYW5kZSBkbyBOb3J0ZSBpZGVudGlmaWNhcsOhIGNsYXJhbWVudGUgbyhzKSBzZXUgKHMpIG5vbWUocykgY29tbyBvIChzKSBhdXRvciAoZXMpIG91IGRldGVudG9yIChlcykgZG9zIGRpcmVpdG9zIGRvIGRvY3VtZW50bwplbnRyZWd1ZSwgZSBuw6NvIGZhcsOhIHF1YWxxdWVyIGFsdGVyYcOnw6NvLCBwYXJhIGFsw6ltIGRhcyBwZXJtaXRpZGFzIHBvcgplc3RhwqBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttp://repositorio.ufrn.br/oai/opendoar:2022-12-06T21:12:23Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN)false |
dc.title.pt_BR.fl_str_mv |
Data imputation analysis for Cosmic Rays time series |
title |
Data imputation analysis for Cosmic Rays time series |
spellingShingle |
Data imputation analysis for Cosmic Rays time series Fernandes, Ronabson Cardoso Bootstrap Expectation maximization Skill Multivariate Chained equations |
title_short |
Data imputation analysis for Cosmic Rays time series |
title_full |
Data imputation analysis for Cosmic Rays time series |
title_fullStr |
Data imputation analysis for Cosmic Rays time series |
title_full_unstemmed |
Data imputation analysis for Cosmic Rays time series |
title_sort |
Data imputation analysis for Cosmic Rays time series |
author |
Fernandes, Ronabson Cardoso |
author_facet |
Fernandes, Ronabson Cardoso Lúcio, Paulo Sérgio Fernandez, José Henrique |
author_role |
author |
author2 |
Lúcio, Paulo Sérgio Fernandez, José Henrique |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Fernandes, Ronabson Cardoso Lúcio, Paulo Sérgio Fernandez, José Henrique |
dc.subject.por.fl_str_mv |
Bootstrap Expectation maximization Skill Multivariate Chained equations |
topic |
Bootstrap Expectation maximization Skill Multivariate Chained equations |
description |
he occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II – runs the bootstrap Expectation Maximization algorithm, MICE – runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI – an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R2, F-test and t-test. The results showed that for CLMX and ROME, the R2 and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series. |
publishDate |
2017 |
dc.date.issued.fl_str_mv |
2017 |
dc.date.accessioned.fl_str_mv |
2020-08-18T15:01:41Z |
dc.date.available.fl_str_mv |
2020-08-18T15:01:41Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
FERNANDES, R.C.; LUCIO, P.S.; FERNANDEZ, J.H.. Data imputation analysis for cosmic rays time series. Advances in Space Research, [s.l.], v. 59, n. 9, p. 2442-2457, maio 2017. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0273117717301199?via%3Dihub. Acesso em: 14 ago. 2020. http://dx.doi.org/10.1016/j.asr.2017.02.022. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufrn.br/jspui/handle/123456789/29847 |
dc.identifier.issn.none.fl_str_mv |
0273-1177 |
dc.identifier.doi.none.fl_str_mv |
10.1016/j.asr.2017.02.022. |
identifier_str_mv |
FERNANDES, R.C.; LUCIO, P.S.; FERNANDEZ, J.H.. Data imputation analysis for cosmic rays time series. Advances in Space Research, [s.l.], v. 59, n. 9, p. 2442-2457, maio 2017. Disponível em: https://www.sciencedirect.com/science/article/abs/pii/S0273117717301199?via%3Dihub. Acesso em: 14 ago. 2020. http://dx.doi.org/10.1016/j.asr.2017.02.022. 0273-1177 10.1016/j.asr.2017.02.022. |
url |
https://repositorio.ufrn.br/jspui/handle/123456789/29847 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
Attribution 3.0 Brazil http://creativecommons.org/licenses/by/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution 3.0 Brazil http://creativecommons.org/licenses/by/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRN instname:Universidade Federal do Rio Grande do Norte (UFRN) instacron:UFRN |
instname_str |
Universidade Federal do Rio Grande do Norte (UFRN) |
instacron_str |
UFRN |
institution |
UFRN |
reponame_str |
Repositório Institucional da UFRN |
collection |
Repositório Institucional da UFRN |
bitstream.url.fl_str_mv |
https://repositorio.ufrn.br/bitstream/123456789/29847/2/license_rdf https://repositorio.ufrn.br/bitstream/123456789/29847/3/license.txt https://repositorio.ufrn.br/bitstream/123456789/29847/4/DataImputationAnalysis_FERNANDEZ_2017.pdf.txt https://repositorio.ufrn.br/bitstream/123456789/29847/5/DataImputationAnalysis_FERNANDEZ_2017.pdf.jpg |
bitstream.checksum.fl_str_mv |
4d2950bda3d176f570a9f8b328dfbbef 6e6f57145bc87daf99079f06b081ff9f 91d85b676f3ad46c6b7d18aa42988645 b1e9ac32b401c7dadaa63c692ef350d3 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRN - Universidade Federal do Rio Grande do Norte (UFRN) |
repository.mail.fl_str_mv |
|
_version_ |
1814832704159481856 |