Time series forecasting with deep forest regression

Detalhes bibliográficos
Autor(a) principal: ANDRADE, Renata Correia de
Data de Publicação: 2020
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFPE
Texto Completo: https://repositorio.ufpe.br/handle/123456789/43505
Resumo: A time series is a collection of ordered observations which are usually measured in repeated intervals. Time series forecasting is an area of research which studies methods for prediction of future values in a series. Forecasting methods range from statistical procedures, such as ARIMA to, more recently, machine learning approaches. Deep neural networks (DNNs) have shown good performance on a great number of tasks, including time series forecasting for which DNNs are considered state-of-the-art. Most deep models today are neural networks, but despite its popularity and proven competitive performance when compared to other machine learning algorithms, DNNs still face some limitations. Most notably, they usually require a large number of training examples - which could be unavailable for smaller time series - and they possess a large number of hyper-parameters which need to be tuned to individual datasets. Multi-grained cascade forest (gcForest) is a deep machine learning algorithm which has been proposed for classification and that addresses DNNs limitations while replicating the features which are responsible for the success of this type of model. This dissertation’s goal is to adapt the original gcForest algorithm in order for it to work with regression problems, enabling it to be applied to time series forecasting. The influence of the two different stages of gcForest - multi-grained scanning and cascade forest, is also investigated. Also explored is the possibility of adding an additional model to the end of the cascade forest structure and thus change the way the final result is calculated. Changes to the algorithm are presented and its performance is evaluated on four different time series datasets, according to three performance metrics: mean squared error, mean absolute error and mean absolute percentage error. Results show that gcForest achieves competitive performance on all four datasets, when compared to traditional machine learning models.