Otimização de desempenho de algoritmo para detecção de outliers em séries temporais

Lima, Beatriz Ferreira de

Otimização de desempenho de algoritmo para detecção de outliers em séries temporais

Detalhes bibliográficos
Autor(a) principal:	Lima, Beatriz Ferreira de
Data de Publicação:	2022
Tipo de documento:	Trabalho de conclusão de curso
Idioma:	por
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://hdl.handle.net/11449/216225
Resumo:	Time series have been present in several areas of great economic value, such as the stock market and industry. In circumstances of Big Data characteristics, data can be processed in real time or near real time, and sources capable of generating high volumes of data, which impose the preparation phase to need to perform cleaning of these data effectively and efficiently when dealing with these requirements. There are several algorithms that can be used to carry out this process, but they may have limitations such as low yield, data distortion, high processing time, among others. One of the problems to be addressed in data preparation is the detection of outliers, data that can reflect distortions and that can imply additional costs in the data cleaning phase. Thus, this work aimed to propose an algorithm that performs the detection of outliers and, subsequently, the cleaning of outliers in an effective and efficient way, which sought to maintain the completeness of the information obtained through time series data. From the tests carried out with the algorithm, it was possible to verify a significant reduction in processing time, up to 70%, without altering the original data.

Metadados do item

id	UNSP_4c8d34455b7bc30a61e99998d3ff2045
oai_identifier_str	oai:repositorio.unesp.br:11449/216225
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	Otimização de desempenho de algoritmo para detecção de outliers em séries temporaisAlgorithm performance optimization for outlier detection in time seriesApache sparkCiência da computaçãoBanco de dadosSéries temporaisAlgoritmos paralelosLimpeza de dadosDetecção de outliersTime series have been present in several areas of great economic value, such as the stock market and industry. In circumstances of Big Data characteristics, data can be processed in real time or near real time, and sources capable of generating high volumes of data, which impose the preparation phase to need to perform cleaning of these data effectively and efficiently when dealing with these requirements. There are several algorithms that can be used to carry out this process, but they may have limitations such as low yield, data distortion, high processing time, among others. One of the problems to be addressed in data preparation is the detection of outliers, data that can reflect distortions and that can imply additional costs in the data cleaning phase. Thus, this work aimed to propose an algorithm that performs the detection of outliers and, subsequently, the cleaning of outliers in an effective and efficient way, which sought to maintain the completeness of the information obtained through time series data. From the tests carried out with the algorithm, it was possible to verify a significant reduction in processing time, up to 70%, without altering the original data.As séries temporais têm se mostrado presentes em diversas áreas de grande valor econômico, como o mercado de ações e a indústria. Em circunstância de características Big Data, pode-se ter os dados processados em tempo real (real time) ou quase em tempo real (near real time), e fontes capazes de gerar volumes elevados de dados, o que impõe a fase de preparação a necessidade de execução da limpeza destes dados de forma eficaz e eficiente ao lidar com estes requisitos. Existem diversos algoritmos que podem ser utilizados para realizar esse processo, porém esses podem conter limitações como baixo rendimento, distorção dos dados, tempo elevado de processamento, entre outros. Um dos problemas a ser tratado na preparação dos dados é a detecção de outliers, dados que podem refletir distorções e que podem implicar em custos adicionais na fase de limpeza dos dados. Assim, este trabalho teve como objetivo propor um algoritmo que realize a detecção de outliers e, posteriormente, a limpeza dos dados discrepantes de forma eficaz e eficiente, em que se buscou manter a integralidade da informação obtida através dos dados de séries temporais. A partir dos testes realizados com o algoritmo, foi possível constatar uma redução significativa no tempo de processamento, de até 70%, sem que os dados originais sofressem alterações.Não recebi financiamentoUniversidade Estadual Paulista (Unesp)Valêncio, Carlos Roberto [UNESP]Universidade Estadual Paulista (Unesp)Lima, Beatriz Ferreira de2022-01-31T21:02:04Z2022-01-31T21:02:04Z2022-01-21info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesisapplication/pdfhttp://hdl.handle.net/11449/216225porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESP2023-10-30T06:08:08Zoai:repositorio.unesp.br:11449/216225Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462023-10-30T06:08:08Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	Otimização de desempenho de algoritmo para detecção de outliers em séries temporais Algorithm performance optimization for outlier detection in time series
title	Otimização de desempenho de algoritmo para detecção de outliers em séries temporais
spellingShingle	Otimização de desempenho de algoritmo para detecção de outliers em séries temporais Lima, Beatriz Ferreira de Apache spark Ciência da computação Banco de dados Séries temporais Algoritmos paralelos Limpeza de dados Detecção de outliers
title_short	Otimização de desempenho de algoritmo para detecção de outliers em séries temporais
title_full	Otimização de desempenho de algoritmo para detecção de outliers em séries temporais
title_fullStr	Otimização de desempenho de algoritmo para detecção de outliers em séries temporais
title_full_unstemmed	Otimização de desempenho de algoritmo para detecção de outliers em séries temporais
title_sort	Otimização de desempenho de algoritmo para detecção de outliers em séries temporais
author	Lima, Beatriz Ferreira de
author_facet	Lima, Beatriz Ferreira de
author_role	author
dc.contributor.none.fl_str_mv	Valêncio, Carlos Roberto [UNESP] Universidade Estadual Paulista (Unesp)
dc.contributor.author.fl_str_mv	Lima, Beatriz Ferreira de
dc.subject.por.fl_str_mv	Apache spark Ciência da computação Banco de dados Séries temporais Algoritmos paralelos Limpeza de dados Detecção de outliers
topic	Apache spark Ciência da computação Banco de dados Séries temporais Algoritmos paralelos Limpeza de dados Detecção de outliers
description	Time series have been present in several areas of great economic value, such as the stock market and industry. In circumstances of Big Data characteristics, data can be processed in real time or near real time, and sources capable of generating high volumes of data, which impose the preparation phase to need to perform cleaning of these data effectively and efficiently when dealing with these requirements. There are several algorithms that can be used to carry out this process, but they may have limitations such as low yield, data distortion, high processing time, among others. One of the problems to be addressed in data preparation is the detection of outliers, data that can reflect distortions and that can imply additional costs in the data cleaning phase. Thus, this work aimed to propose an algorithm that performs the detection of outliers and, subsequently, the cleaning of outliers in an effective and efficient way, which sought to maintain the completeness of the information obtained through time series data. From the tests carried out with the algorithm, it was possible to verify a significant reduction in processing time, up to 70%, without altering the original data.
publishDate	2022
dc.date.none.fl_str_mv	2022-01-31T21:02:04Z 2022-01-31T21:02:04Z 2022-01-21
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/bachelorThesis
format	bachelorThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/11449/216225
url	http://hdl.handle.net/11449/216225
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Estadual Paulista (Unesp)
publisher.none.fl_str_mv	Universidade Estadual Paulista (Unesp)
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1803649567714967552

Otimização de desempenho de algoritmo para detecção de outliers em séries temporais

Registros relacionados