Evolving discrete dynamic bayesian networks: an approach for time series

Detalhes bibliográficos
Autor(a) principal: Santos, Talysson Manoel de Oliveira
Data de Publicação: 2023
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: https://www.teses.usp.br/teses/disponiveis/18/18153/tde-20092023-105645/
Resumo: Knowledge discovery in time series datasets is a subject of great interest and importance in academics and industry. For such purpose, a set of theories and computational tools have been proposed and used to extract useful information from time series to assist in decision-making in different areas. Among the possibilities, Bayesian network is a probabilistic graphical model representing a set of random variables and their conditional statistical dependencies via a directed acyclic graph (DAG). This doctoral research proposes a methodology for dealing with time series based on evolving discrete Dynamic Bayesian Networks (EDBN) by an analytical threshold for selecting directed edges by the occurrence frequency as new datasets are collected. In this proposal, as new datasets are collected, the algorithm learns the structure of a DBN by using a score metric and the hill-climbing method and then uses the analytical threshold for selecting the directed edges between the nodes by the occurrence frequency. The developed method smoothly converges to a robust model and constantly adapts to the arrival of new data, obtaining more reliable network models. The discrete model is chosen to be a non-parametric approach that can be adequate for different data behaviour without manual modifications, i.e., totally data-driven. The proposal was evaluated by dealing with real datasets of time series in data imputation and CO2 emissions forecasting during energy generation, which are two contexts that have received a lot of attention from researchers in recent years. Evaluating the results against widely used imputation methods, the proposed approach proved capable of handling data imputation in time series datasets for missing completely at random and for missing not at random. In the context of CO2 emissions forecasting in multi-source power generation systems, real datasets of Belgium, Germany, Portugal, and Spain were used. The proposed approach showed to be capable of dealing with CO2 emissions forecasting in the systems evaluated in this study. Comparing the results against a traditional DBN that not evolve the structure over time, the proposal developed was superior highlighting a contribution of performance improvement. The proposed method was also better when compared to other traditional methods. Moreover, the model also is computationally efficient, making the proposal a good option for embedding such an approach for dealing with time series in online applications.
id USP_650ab358e4fe236b7d67f974a5e6b0ac
oai_identifier_str oai:teses.usp.br:tde-20092023-105645
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Evolving discrete dynamic bayesian networks: an approach for time seriesEvolução de redes bayesianas dinâmicas discretas: uma abordagem para lidar com séries temporaisaprendizado de estruturas robustasCO2 emissions forecastingdados faltantesevolving dynamic bayesian networklearning of robust structuresmissing dataprevisão de emissões de CO2rede bayesiana dinâmica evolutivaséries temporaistime seriesKnowledge discovery in time series datasets is a subject of great interest and importance in academics and industry. For such purpose, a set of theories and computational tools have been proposed and used to extract useful information from time series to assist in decision-making in different areas. Among the possibilities, Bayesian network is a probabilistic graphical model representing a set of random variables and their conditional statistical dependencies via a directed acyclic graph (DAG). This doctoral research proposes a methodology for dealing with time series based on evolving discrete Dynamic Bayesian Networks (EDBN) by an analytical threshold for selecting directed edges by the occurrence frequency as new datasets are collected. In this proposal, as new datasets are collected, the algorithm learns the structure of a DBN by using a score metric and the hill-climbing method and then uses the analytical threshold for selecting the directed edges between the nodes by the occurrence frequency. The developed method smoothly converges to a robust model and constantly adapts to the arrival of new data, obtaining more reliable network models. The discrete model is chosen to be a non-parametric approach that can be adequate for different data behaviour without manual modifications, i.e., totally data-driven. The proposal was evaluated by dealing with real datasets of time series in data imputation and CO2 emissions forecasting during energy generation, which are two contexts that have received a lot of attention from researchers in recent years. Evaluating the results against widely used imputation methods, the proposed approach proved capable of handling data imputation in time series datasets for missing completely at random and for missing not at random. In the context of CO2 emissions forecasting in multi-source power generation systems, real datasets of Belgium, Germany, Portugal, and Spain were used. The proposed approach showed to be capable of dealing with CO2 emissions forecasting in the systems evaluated in this study. Comparing the results against a traditional DBN that not evolve the structure over time, the proposal developed was superior highlighting a contribution of performance improvement. The proposed method was also better when compared to other traditional methods. Moreover, the model also is computationally efficient, making the proposal a good option for embedding such an approach for dealing with time series in online applications.A descoberta de conhecimento em conjuntos de dados de séries temporais é um assunto de grande interesse e importância tanto na academia quanto na indústria. Para tal, um conjunto de teorias e ferramentas computacionais foram propostas e utilizadas para extrair informações úteis de séries temporais para auxiliar na tomada de decisões em diferentes áreas. Dentre as possibilidades, a rede bayesiana é um modelo gráfico probabilístico que representa um conjunto de variáveis aleatórias e suas dependências estatísticas condicionais por meio de um grafo acíclico direcionado (DAG). Nesta pesquisa de doutorado, propõe-se uma metodologia para lidar com séries temporais baseada na evolução de Redes Bayesianas Dinâmicas (EDBN) discretas por um limiar analítico para selecionar arestas direcionadas pela frequência de ocorrência à medida que novos conjuntos de dados são coletados. Assim, nesta proposta, à medida que novos conjuntos de dados são coletados, o algoritmo aprende a estrutura de um DBN usando uma métrica de pontuação e o método hill-climbing e então usa o limite analítico para selecionar as arestas direcionadas entre os nós pela frequência de ocorrência. O método desenvolvido converge suavemente para um modelo robusto e se adapta constantemente à chegada de novos dados, obtendo modelos de rede mais confiáveis. Escolhe-se o modelo discreto por ser uma abordagem não paramétrica que pode ser adequada para diferentes comportamentos de dados sem modificações manuais, ou seja, totalmente orientado a dados. Avaliou-se essa proposta lidando com conjuntos de dados reais de séries temporais em imputação de dados e previsão de emissões de CO2 durante a geração de energia, que são dois contextos que receberam muita atenção de pesquisadores nos últimos anos. Avaliando os resultados em relação aos métodos de imputação amplamente utilizados, a abordagem proposta provou ser capaz de lidar com a imputação de dados em conjuntos de dados de séries temporais para faltas completamente aleatórias e para faltas não aleatórias. No contexto da previsão de emissões de CO2 em sistemas de geração de energia de várias fontes, foi utilizado conjuntos de dados reais da Bélgica, Alemanha, Portugal e Espanha. A abordagem proposta mostrou-se capaz de lidar com a previsão de emissões de CO2 nos sistemas avaliados neste estudo. Comparando os resultados com um DBN tradicional que não evolui a estrutura ao longo do tempo, a proposta desenvolvida foi superior destacando uma contribuição de melhoria de desempenho. O método proposto também foi melhor quando comparado a outros métodos tradicionais. Além disso, o modelo também é computacionalmente eficiente, tornando a proposta desenvolvida uma boa opção para incorporar tal abordagem para lidar com séries temporais em aplicações online.Biblioteca Digitais de Teses e Dissertações da USPSilva, Ivan Nunes daSantos, Talysson Manoel de Oliveira2023-08-16info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/18/18153/tde-20092023-105645/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2023-09-21T15:04:02Zoai:teses.usp.br:tde-20092023-105645Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212023-09-21T15:04:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Evolving discrete dynamic bayesian networks: an approach for time series
Evolução de redes bayesianas dinâmicas discretas: uma abordagem para lidar com séries temporais
title Evolving discrete dynamic bayesian networks: an approach for time series
spellingShingle Evolving discrete dynamic bayesian networks: an approach for time series
Santos, Talysson Manoel de Oliveira
aprendizado de estruturas robustas
CO2 emissions forecasting
dados faltantes
evolving dynamic bayesian network
learning of robust structures
missing data
previsão de emissões de CO2
rede bayesiana dinâmica evolutiva
séries temporais
time series
title_short Evolving discrete dynamic bayesian networks: an approach for time series
title_full Evolving discrete dynamic bayesian networks: an approach for time series
title_fullStr Evolving discrete dynamic bayesian networks: an approach for time series
title_full_unstemmed Evolving discrete dynamic bayesian networks: an approach for time series
title_sort Evolving discrete dynamic bayesian networks: an approach for time series
author Santos, Talysson Manoel de Oliveira
author_facet Santos, Talysson Manoel de Oliveira
author_role author
dc.contributor.none.fl_str_mv Silva, Ivan Nunes da
dc.contributor.author.fl_str_mv Santos, Talysson Manoel de Oliveira
dc.subject.por.fl_str_mv aprendizado de estruturas robustas
CO2 emissions forecasting
dados faltantes
evolving dynamic bayesian network
learning of robust structures
missing data
previsão de emissões de CO2
rede bayesiana dinâmica evolutiva
séries temporais
time series
topic aprendizado de estruturas robustas
CO2 emissions forecasting
dados faltantes
evolving dynamic bayesian network
learning of robust structures
missing data
previsão de emissões de CO2
rede bayesiana dinâmica evolutiva
séries temporais
time series
description Knowledge discovery in time series datasets is a subject of great interest and importance in academics and industry. For such purpose, a set of theories and computational tools have been proposed and used to extract useful information from time series to assist in decision-making in different areas. Among the possibilities, Bayesian network is a probabilistic graphical model representing a set of random variables and their conditional statistical dependencies via a directed acyclic graph (DAG). This doctoral research proposes a methodology for dealing with time series based on evolving discrete Dynamic Bayesian Networks (EDBN) by an analytical threshold for selecting directed edges by the occurrence frequency as new datasets are collected. In this proposal, as new datasets are collected, the algorithm learns the structure of a DBN by using a score metric and the hill-climbing method and then uses the analytical threshold for selecting the directed edges between the nodes by the occurrence frequency. The developed method smoothly converges to a robust model and constantly adapts to the arrival of new data, obtaining more reliable network models. The discrete model is chosen to be a non-parametric approach that can be adequate for different data behaviour without manual modifications, i.e., totally data-driven. The proposal was evaluated by dealing with real datasets of time series in data imputation and CO2 emissions forecasting during energy generation, which are two contexts that have received a lot of attention from researchers in recent years. Evaluating the results against widely used imputation methods, the proposed approach proved capable of handling data imputation in time series datasets for missing completely at random and for missing not at random. In the context of CO2 emissions forecasting in multi-source power generation systems, real datasets of Belgium, Germany, Portugal, and Spain were used. The proposed approach showed to be capable of dealing with CO2 emissions forecasting in the systems evaluated in this study. Comparing the results against a traditional DBN that not evolve the structure over time, the proposal developed was superior highlighting a contribution of performance improvement. The proposed method was also better when compared to other traditional methods. Moreover, the model also is computationally efficient, making the proposal a good option for embedding such an approach for dealing with time series in online applications.
publishDate 2023
dc.date.none.fl_str_mv 2023-08-16
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/18/18153/tde-20092023-105645/
url https://www.teses.usp.br/teses/disponiveis/18/18153/tde-20092023-105645/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1815257183069143040