Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFSCAR |
Texto Completo: | https://repositorio.ufscar.br/handle/ufscar/8280 |
Resumo: | The growing amount of data produced daily, by both businesses and individuals in the web, increased the demand for analysis and extraction of knowledge of this data. While the last two decades the solution was to store and perform data mining algorithms, currently it has become unviable even to supercomputers. In addition, the requirements of the Big Data age go far beyond the large amount of data to analyze. Response time requirements and complexity of the data acquire more weight in many areas in the real world. New models have been researched and developed, often proposing distributed computing or different ways to handle the data stream mining. Current researches shows that an alternative in the data stream mining is to join a real-time event handling mechanism with a classic mining association rules or sequential patterns algorithms. In this work is shown a data stream mining approach to meet the Big Data response time requirement, linking the event handling mechanism in real time Esper and Incremental Miner of Stretchy Time Sequences (IncMSTS) algorithm. The results show that is possible to take a static data mining algorithm for data stream environment and keep tendency in the patterns, although not possible to continuously read all data coming into the data stream. |
id |
SCAR_9e0e84a88c64313089a34c5982705110 |
---|---|
oai_identifier_str |
oai:repositorio.ufscar.br:ufscar/8280 |
network_acronym_str |
SCAR |
network_name_str |
Repositório Institucional da UFSCAR |
repository_id_str |
4322 |
spelling |
Carvalho, Danilo CodecoSantos, Marilde Terezinha Pradohttp://lattes.cnpq.br/9826026025118073http://lattes.cnpq.br/28249917156133705b32a537-fc27-4610-a652-54b8da3419452016-11-08T18:42:49Z2016-11-08T18:42:49Z2016-06-06CARVALHO, Danilo Codeco. Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data. 2016. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8280.https://repositorio.ufscar.br/handle/ufscar/8280The growing amount of data produced daily, by both businesses and individuals in the web, increased the demand for analysis and extraction of knowledge of this data. While the last two decades the solution was to store and perform data mining algorithms, currently it has become unviable even to supercomputers. In addition, the requirements of the Big Data age go far beyond the large amount of data to analyze. Response time requirements and complexity of the data acquire more weight in many areas in the real world. New models have been researched and developed, often proposing distributed computing or different ways to handle the data stream mining. Current researches shows that an alternative in the data stream mining is to join a real-time event handling mechanism with a classic mining association rules or sequential patterns algorithms. In this work is shown a data stream mining approach to meet the Big Data response time requirement, linking the event handling mechanism in real time Esper and Incremental Miner of Stretchy Time Sequences (IncMSTS) algorithm. The results show that is possible to take a static data mining algorithm for data stream environment and keep tendency in the patterns, although not possible to continuously read all data coming into the data stream.O crescimento da quantidade de dados produzidos diariamente, tanto por empresas como por indivíduos na web, aumentou a exigência para a análise e extração de conhecimento sobre esses dados. Enquanto nas duas últimas décadas a solução era armazenar e executar algoritmos de mineração de dados, atualmente isso se tornou inviável mesmo em super computadores. Além disso, os requisitos da chamada era do Big Data vão muito além da grande quantidade de dados a se analisar. Requisitos de tempo de resposta e complexidade dos dados adquirem maior peso em muitos domínios no mundo real. Novos modelos têm sido pesquisados e desenvolvidos, muitas vezes propondo computação distribuída ou diferentes formas de se tratar a mineração de fluxo de dados. Pesquisas atuais mostram que uma alternativa na mineração de fluxo de dados é unir um mecanismo de tratamento de eventos em tempo real com algoritmos clássicos de mineração de regras de associação ou padrões sequenciais. Neste trabalho é mostrada uma abordagem de mineração de fluxo de dados (data stream) para atender ao requisito de tempo de resposta do Big Data, que une o mecanismo de manipulação de eventos em tempo real Esper e o algoritmo Incremental Miner of Stretchy Time Sequences (IncMSTS). Os resultados mostram ser possível levar um algoritmo de mineração de dados estático para o ambiente de fluxo de dados e manter as tendências de padrões encontrados, mesmo não sendo possível ler todos os dados vindos continuamente no fluxo de dados.Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Ciência da Computação - PPGCCUFSCarMineração de dadosMineração no Big DataMineração de data streamsMineração em fluxos de dadosProcessamento de eventos complexosData miningMining Big DataData stream miningComplex event processingSliding windowSequential pattern miningAssociation rule miningCIENCIAS EXATAS E DA TERRAObtenção de padrões sequenciais em data streams atendendo requisitos do Big Datainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline6006001bdb200e-99c1-45c7-8e62-ff292489211einfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALDissDCC.pdfDissDCC.pdfapplication/pdf2421455https://repositorio.ufscar.br/bitstream/ufscar/8280/1/DissDCC.pdf5fd16625959b31340d5f845754f109ceMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstream/ufscar/8280/2/license.txtae0398b6f8b235e40ad82cba6c50031dMD52TEXTDissDCC.pdf.txtDissDCC.pdf.txtExtracted texttext/plain151132https://repositorio.ufscar.br/bitstream/ufscar/8280/3/DissDCC.pdf.txt9eee3b64a33f7f4b47629beeecefd06dMD53THUMBNAILDissDCC.pdf.jpgDissDCC.pdf.jpgIM Thumbnailimage/jpeg7860https://repositorio.ufscar.br/bitstream/ufscar/8280/4/DissDCC.pdf.jpg6c3ff03b9386f109d66b8028bc6914e8MD54ufscar/82802023-09-18 18:31:01.124oai:repositorio.ufscar.br:ufscar/8280TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:01Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false |
dc.title.por.fl_str_mv |
Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data |
title |
Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data |
spellingShingle |
Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data Carvalho, Danilo Codeco Mineração de dados Mineração no Big Data Mineração de data streams Mineração em fluxos de dados Processamento de eventos complexos Data mining Mining Big Data Data stream mining Complex event processing Sliding window Sequential pattern mining Association rule mining CIENCIAS EXATAS E DA TERRA |
title_short |
Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data |
title_full |
Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data |
title_fullStr |
Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data |
title_full_unstemmed |
Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data |
title_sort |
Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data |
author |
Carvalho, Danilo Codeco |
author_facet |
Carvalho, Danilo Codeco |
author_role |
author |
dc.contributor.authorlattes.por.fl_str_mv |
http://lattes.cnpq.br/2824991715613370 |
dc.contributor.author.fl_str_mv |
Carvalho, Danilo Codeco |
dc.contributor.advisor1.fl_str_mv |
Santos, Marilde Terezinha Prado |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/9826026025118073 |
dc.contributor.authorID.fl_str_mv |
5b32a537-fc27-4610-a652-54b8da341945 |
contributor_str_mv |
Santos, Marilde Terezinha Prado |
dc.subject.por.fl_str_mv |
Mineração de dados Mineração no Big Data Mineração de data streams Mineração em fluxos de dados Processamento de eventos complexos |
topic |
Mineração de dados Mineração no Big Data Mineração de data streams Mineração em fluxos de dados Processamento de eventos complexos Data mining Mining Big Data Data stream mining Complex event processing Sliding window Sequential pattern mining Association rule mining CIENCIAS EXATAS E DA TERRA |
dc.subject.eng.fl_str_mv |
Data mining Mining Big Data Data stream mining Complex event processing Sliding window Sequential pattern mining Association rule mining |
dc.subject.cnpq.fl_str_mv |
CIENCIAS EXATAS E DA TERRA |
description |
The growing amount of data produced daily, by both businesses and individuals in the web, increased the demand for analysis and extraction of knowledge of this data. While the last two decades the solution was to store and perform data mining algorithms, currently it has become unviable even to supercomputers. In addition, the requirements of the Big Data age go far beyond the large amount of data to analyze. Response time requirements and complexity of the data acquire more weight in many areas in the real world. New models have been researched and developed, often proposing distributed computing or different ways to handle the data stream mining. Current researches shows that an alternative in the data stream mining is to join a real-time event handling mechanism with a classic mining association rules or sequential patterns algorithms. In this work is shown a data stream mining approach to meet the Big Data response time requirement, linking the event handling mechanism in real time Esper and Incremental Miner of Stretchy Time Sequences (IncMSTS) algorithm. The results show that is possible to take a static data mining algorithm for data stream environment and keep tendency in the patterns, although not possible to continuously read all data coming into the data stream. |
publishDate |
2016 |
dc.date.accessioned.fl_str_mv |
2016-11-08T18:42:49Z |
dc.date.available.fl_str_mv |
2016-11-08T18:42:49Z |
dc.date.issued.fl_str_mv |
2016-06-06 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
CARVALHO, Danilo Codeco. Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data. 2016. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8280. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufscar.br/handle/ufscar/8280 |
identifier_str_mv |
CARVALHO, Danilo Codeco. Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data. 2016. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2016. Disponível em: https://repositorio.ufscar.br/handle/ufscar/8280. |
url |
https://repositorio.ufscar.br/handle/ufscar/8280 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.confidence.fl_str_mv |
600 600 |
dc.relation.authority.fl_str_mv |
1bdb200e-99c1-45c7-8e62-ff292489211e |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
dc.publisher.program.fl_str_mv |
Programa de Pós-Graduação em Ciência da Computação - PPGCC |
dc.publisher.initials.fl_str_mv |
UFSCar |
publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR |
instname_str |
Universidade Federal de São Carlos (UFSCAR) |
instacron_str |
UFSCAR |
institution |
UFSCAR |
reponame_str |
Repositório Institucional da UFSCAR |
collection |
Repositório Institucional da UFSCAR |
bitstream.url.fl_str_mv |
https://repositorio.ufscar.br/bitstream/ufscar/8280/1/DissDCC.pdf https://repositorio.ufscar.br/bitstream/ufscar/8280/2/license.txt https://repositorio.ufscar.br/bitstream/ufscar/8280/3/DissDCC.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/8280/4/DissDCC.pdf.jpg |
bitstream.checksum.fl_str_mv |
5fd16625959b31340d5f845754f109ce ae0398b6f8b235e40ad82cba6c50031d 9eee3b64a33f7f4b47629beeecefd06d 6c3ff03b9386f109d66b8028bc6914e8 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR) |
repository.mail.fl_str_mv |
|
_version_ |
1813715569830002688 |