Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data

Detalhes bibliográficos
Autor(a) principal: Carvalho, Danilo Codeco
Data de Publicação: 2016
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFSCAR
Texto Completo: https://repositorio.ufscar.br/handle/ufscar/8280
Resumo: The growing amount of data produced daily, by both businesses and individuals in the web, increased the demand for analysis and extraction of knowledge of this data. While the last two decades the solution was to store and perform data mining algorithms, currently it has become unviable even to supercomputers. In addition, the requirements of the Big Data age go far beyond the large amount of data to analyze. Response time requirements and complexity of the data acquire more weight in many areas in the real world. New models have been researched and developed, often proposing distributed computing or different ways to handle the data stream mining. Current researches shows that an alternative in the data stream mining is to join a real-time event handling mechanism with a classic mining association rules or sequential patterns algorithms. In this work is shown a data stream mining approach to meet the Big Data response time requirement, linking the event handling mechanism in real time Esper and Incremental Miner of Stretchy Time Sequences (IncMSTS) algorithm. The results show that is possible to take a static data mining algorithm for data stream environment and keep tendency in the patterns, although not possible to continuously read all data coming into the data stream.
id SCAR_9e0e84a88c64313089a34c5982705110
oai_identifier_str oai:repositorio.ufscar.br:ufscar/8280
network_acronym_str SCAR
network_name_str Repositório Institucional da UFSCAR
repository_id_str 4322
spelling Carvalho, Danilo CodecoSantos, Marilde Terezinha Pradohttp://lattes.cnpq.br/9826026025118073http://lattes.cnpq.br/28249917156133705b32a537-fc27-4610-a652-54b8da3419452016-11-08T18:42:49Z2016-11-08T18:42:49Z2016-06-06CARVALHO, Danilo Codeco. Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data. 2016. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8280.https://repositorio.ufscar.br/handle/ufscar/8280The growing amount of data produced daily, by both businesses and individuals in the web, increased the demand for analysis and extraction of knowledge of this data. While the last two decades the solution was to store and perform data mining algorithms, currently it has become unviable even to supercomputers. In addition, the requirements of the Big Data age go far beyond the large amount of data to analyze. Response time requirements and complexity of the data acquire more weight in many areas in the real world. New models have been researched and developed, often proposing distributed computing or different ways to handle the data stream mining. Current researches shows that an alternative in the data stream mining is to join a real-time event handling mechanism with a classic mining association rules or sequential patterns algorithms. In this work is shown a data stream mining approach to meet the Big Data response time requirement, linking the event handling mechanism in real time Esper and Incremental Miner of Stretchy Time Sequences (IncMSTS) algorithm. The results show that is possible to take a static data mining algorithm for data stream environment and keep tendency in the patterns, although not possible to continuously read all data coming into the data stream.O crescimento da quantidade de dados produzidos diariamente, tanto por empresas como por indivíduos na web, aumentou a exigência para a análise e extração de conhecimento sobre esses dados. Enquanto nas duas últimas décadas a solução era armazenar e executar algoritmos de mineração de dados, atualmente isso se tornou inviável mesmo em super computadores. Além disso, os requisitos da chamada era do Big Data vão muito além da grande quantidade de dados a se analisar. Requisitos de tempo de resposta e complexidade dos dados adquirem maior peso em muitos domínios no mundo real. Novos modelos têm sido pesquisados e desenvolvidos, muitas vezes propondo computação distribuída ou diferentes formas de se tratar a mineração de fluxo de dados. Pesquisas atuais mostram que uma alternativa na mineração de fluxo de dados é unir um mecanismo de tratamento de eventos em tempo real com algoritmos clássicos de mineração de regras de associação ou padrões sequenciais. Neste trabalho é mostrada uma abordagem de mineração de fluxo de dados (data stream) para atender ao requisito de tempo de resposta do Big Data, que une o mecanismo de manipulação de eventos em tempo real Esper e o algoritmo Incremental Miner of Stretchy Time Sequences (IncMSTS). Os resultados mostram ser possível levar um algoritmo de mineração de dados estático para o ambiente de fluxo de dados e manter as tendências de padrões encontrados, mesmo não sendo possível ler todos os dados vindos continuamente no fluxo de dados.Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Ciência da Computação - PPGCCUFSCarMineração de dadosMineração no Big DataMineração de data streamsMineração em fluxos de dadosProcessamento de eventos complexosData miningMining Big DataData stream miningComplex event processingSliding windowSequential pattern miningAssociation rule miningCIENCIAS EXATAS E DA TERRAObtenção de padrões sequenciais em data streams atendendo requisitos do Big Datainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline6006001bdb200e-99c1-45c7-8e62-ff292489211einfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALDissDCC.pdfDissDCC.pdfapplication/pdf2421455https://repositorio.ufscar.br/bitstream/ufscar/8280/1/DissDCC.pdf5fd16625959b31340d5f845754f109ceMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstream/ufscar/8280/2/license.txtae0398b6f8b235e40ad82cba6c50031dMD52TEXTDissDCC.pdf.txtDissDCC.pdf.txtExtracted texttext/plain151132https://repositorio.ufscar.br/bitstream/ufscar/8280/3/DissDCC.pdf.txt9eee3b64a33f7f4b47629beeecefd06dMD53THUMBNAILDissDCC.pdf.jpgDissDCC.pdf.jpgIM Thumbnailimage/jpeg7860https://repositorio.ufscar.br/bitstream/ufscar/8280/4/DissDCC.pdf.jpg6c3ff03b9386f109d66b8028bc6914e8MD54ufscar/82802023-09-18 18:31:01.124oai:repositorio.ufscar.br:ufscar/8280TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:01Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data
title Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data
spellingShingle Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data
Carvalho, Danilo Codeco
Mineração de dados
Mineração no Big Data
Mineração de data streams
Mineração em fluxos de dados
Processamento de eventos complexos
Data mining
Mining Big Data
Data stream mining
Complex event processing
Sliding window
Sequential pattern mining
Association rule mining
CIENCIAS EXATAS E DA TERRA
title_short Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data
title_full Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data
title_fullStr Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data
title_full_unstemmed Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data
title_sort Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data
author Carvalho, Danilo Codeco
author_facet Carvalho, Danilo Codeco
author_role author
dc.contributor.authorlattes.por.fl_str_mv http://lattes.cnpq.br/2824991715613370
dc.contributor.author.fl_str_mv Carvalho, Danilo Codeco
dc.contributor.advisor1.fl_str_mv Santos, Marilde Terezinha Prado
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/9826026025118073
dc.contributor.authorID.fl_str_mv 5b32a537-fc27-4610-a652-54b8da341945
contributor_str_mv Santos, Marilde Terezinha Prado
dc.subject.por.fl_str_mv Mineração de dados
Mineração no Big Data
Mineração de data streams
Mineração em fluxos de dados
Processamento de eventos complexos
topic Mineração de dados
Mineração no Big Data
Mineração de data streams
Mineração em fluxos de dados
Processamento de eventos complexos
Data mining
Mining Big Data
Data stream mining
Complex event processing
Sliding window
Sequential pattern mining
Association rule mining
CIENCIAS EXATAS E DA TERRA
dc.subject.eng.fl_str_mv Data mining
Mining Big Data
Data stream mining
Complex event processing
Sliding window
Sequential pattern mining
Association rule mining
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA
description The growing amount of data produced daily, by both businesses and individuals in the web, increased the demand for analysis and extraction of knowledge of this data. While the last two decades the solution was to store and perform data mining algorithms, currently it has become unviable even to supercomputers. In addition, the requirements of the Big Data age go far beyond the large amount of data to analyze. Response time requirements and complexity of the data acquire more weight in many areas in the real world. New models have been researched and developed, often proposing distributed computing or different ways to handle the data stream mining. Current researches shows that an alternative in the data stream mining is to join a real-time event handling mechanism with a classic mining association rules or sequential patterns algorithms. In this work is shown a data stream mining approach to meet the Big Data response time requirement, linking the event handling mechanism in real time Esper and Incremental Miner of Stretchy Time Sequences (IncMSTS) algorithm. The results show that is possible to take a static data mining algorithm for data stream environment and keep tendency in the patterns, although not possible to continuously read all data coming into the data stream.
publishDate 2016
dc.date.accessioned.fl_str_mv 2016-11-08T18:42:49Z
dc.date.available.fl_str_mv 2016-11-08T18:42:49Z
dc.date.issued.fl_str_mv 2016-06-06
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv CARVALHO, Danilo Codeco. Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data. 2016. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8280.
dc.identifier.uri.fl_str_mv https://repositorio.ufscar.br/handle/ufscar/8280
identifier_str_mv CARVALHO, Danilo Codeco. Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data. 2016. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8280.
url https://repositorio.ufscar.br/handle/ufscar/8280
dc.language.iso.fl_str_mv por
language por
dc.relation.confidence.fl_str_mv 600
600
dc.relation.authority.fl_str_mv 1bdb200e-99c1-45c7-8e62-ff292489211e
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.publisher.initials.fl_str_mv UFSCar
publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFSCAR
instname:Universidade Federal de São Carlos (UFSCAR)
instacron:UFSCAR
instname_str Universidade Federal de São Carlos (UFSCAR)
instacron_str UFSCAR
institution UFSCAR
reponame_str Repositório Institucional da UFSCAR
collection Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv https://repositorio.ufscar.br/bitstream/ufscar/8280/1/DissDCC.pdf
https://repositorio.ufscar.br/bitstream/ufscar/8280/2/license.txt
https://repositorio.ufscar.br/bitstream/ufscar/8280/3/DissDCC.pdf.txt
https://repositorio.ufscar.br/bitstream/ufscar/8280/4/DissDCC.pdf.jpg
bitstream.checksum.fl_str_mv 5fd16625959b31340d5f845754f109ce
ae0398b6f8b235e40ad82cba6c50031d
9eee3b64a33f7f4b47629beeecefd06d
6c3ff03b9386f109d66b8028bc6914e8
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_ 1813715569830002688