Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.

Santo, Giulio Cesare Mastrocinque

Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.

Detalhes bibliográficos
Autor(a) principal:	Santo, Giulio Cesare Mastrocinque
Data de Publicação:	2020
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da USP
Texto Completo:	https://www.teses.usp.br/teses/disponiveis/3/3139/tde-05032021-111034/
Resumo:	System Identification is a set of model estimation techniques traditionally used by in- dustries to improve and optimize their processes. Estimating dynamic process models requires the existence of informative and representative data of the system, which are usually generated through physical experiments on the plants. However, such procedures often need to be performed multiple times to produce adequate datasets, which may result in products that are out of specification. On the other hand, the emergence of powerful data storage and management software, as well as the constant development in the areas of mining and data science represent a potential paradigm break in industry, in which robust data-driven solutions can be adopted. The direct use of historical data to extract useful information from industrial processes is a central part of this work, in which it is proposed a comparison of data mining techniques with the objective of finding time intervals with sucient information to perform system identification. For this purpose, a detailed review on the literature regarding the problem is initially provided. Then, dierent mining algorithms are applied to both Single-Input Single-Output and Multiple-Input Multiple-Output systems operating in open-loop and in closed-loop. Simulated data is used to didactically exemplify how each method works and to validate the expected outcomes in an ideal scenario. Regressive models are then estimated with the obtained intervals, which are used to perform cross-validation. Finally, the proposed methods are applied to real multivariable data coming from an industrial petrochemical furnace. Results obtained through simulated data show that the proposed data mining strategies allowed the estimation of good models in cross-validation scenarios with 1, 10, 100 and infinite prediction steps. Real data applications, in turn, revealed to be challenging due to the noisy nature of the data and due to the scarcity of historical intervals in which all the inputs of the multivariable system are suciently active to estimate a model. However, this problem is overcome through the use of multiple intervals in the estimation process, elucidating that the adopted algorithms can also produce reasonable models in real scenarios.

Metadados do item

id	USP_1940a5ee2ef75250ab059f23337e2bcc
oai_identifier_str	oai:teses.usp.br:tde-05032021-111034
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.Técnicas de mineração de dados aplicadas a dados históricos de processos industriais como uma ferramenta para encontrar intervalos temporais adequados à identificação de sistemas.Ciência de dadosCondition numberDados históricosData miningData qualityData ScienceData segmentationEffective rankHistorical dataIdentificação de sistemasMineração de dadosMultivariable systemsNúmero de condicionamentoPosto efetivoQualidade de dadosSegmentação de dadosSistemas multivariáveisSystem identificationSystem Identification is a set of model estimation techniques traditionally used by in- dustries to improve and optimize their processes. Estimating dynamic process models requires the existence of informative and representative data of the system, which are usually generated through physical experiments on the plants. However, such procedures often need to be performed multiple times to produce adequate datasets, which may result in products that are out of specification. On the other hand, the emergence of powerful data storage and management software, as well as the constant development in the areas of mining and data science represent a potential paradigm break in industry, in which robust data-driven solutions can be adopted. The direct use of historical data to extract useful information from industrial processes is a central part of this work, in which it is proposed a comparison of data mining techniques with the objective of finding time intervals with sucient information to perform system identification. For this purpose, a detailed review on the literature regarding the problem is initially provided. Then, dierent mining algorithms are applied to both Single-Input Single-Output and Multiple-Input Multiple-Output systems operating in open-loop and in closed-loop. Simulated data is used to didactically exemplify how each method works and to validate the expected outcomes in an ideal scenario. Regressive models are then estimated with the obtained intervals, which are used to perform cross-validation. Finally, the proposed methods are applied to real multivariable data coming from an industrial petrochemical furnace. Results obtained through simulated data show that the proposed data mining strategies allowed the estimation of good models in cross-validation scenarios with 1, 10, 100 and infinite prediction steps. Real data applications, in turn, revealed to be challenging due to the noisy nature of the data and due to the scarcity of historical intervals in which all the inputs of the multivariable system are suciently active to estimate a model. However, this problem is overcome through the use of multiple intervals in the estimation process, elucidating that the adopted algorithms can also produce reasonable models in real scenarios.A Identificação de Sistemas é um conjunto de técnicas para estimação de modelos tradicionalmente utilizada pelas indústrias para aprimorar e otimizar os seus processos. A estimação de modelos dinâmicos de processos requer a existência de dados informativos e representativos do sistema, os quais são normalmente gerados através da realização de experimentos físicos nas plantas. Tais procedimentos muitas vezes necessitam ser executados múltiplas vezes para produzir dados adequados, podendo resultar em produtos fora de especificação. Por outro lado, o surgimento de softwares poderosos de armazenamento e gerenciamento de dados e a constante evolução de conhecimento nas áreas de mineração e ciência de dados representam uma possibilidade de quebra de paradigma na indústria, em que soluções robustas orientadas a dados podem ser adotadas. A utilização direta de dados históricos para a extração de informações úteis de processos industriais é parte central deste trabalho, em que se propõe a comparação de técnicas de mineração de dados com o objetivo de encontrar intervalos temporais com informações suficientes para a realização de identificação de sistemas. Para esse propósito, uma revisão detalhada da literatura a respeito desse problema é inicialmente apresentada. Em seguida, diferentes algoritmos de mineração de dados são aplicados tanto em sistemas de uma entrada e uma saída quanto em sistemas multientradas, multisaídas operando em malha aberta e em malha fechada. Dados de simulação são utilizados para exemplificar didaticamente o funcionamento de cada método e para validar os resultados em casos ideais. Modelos regressivos são então estimados com os intervalos obtidos, os quais são utilizados para a realização de validações cruzadas. Finalmente, os métodos propostos são aplicados em dados reais multivariáveis provenientes de um forno industrial petroquímico. Os resultados obtidos através de dados de simulação mostram que as estratégias de mineração de dados propostas permitiram a obtenção de bons modelos em cenários de validação cruzada com 1, 10, 100 e infinitos passos de predição. As aplicações em dados reais, por sua vez, revelaram-se desafiadoras devido à natureza ruidosa dos dados e devido a escassez de intervalos históricos nos quais todas as entradas do sistema multivariável são suficientemente ativas para produzir um modelo. No entanto, esse problema é contornado através da utilização de múltiplos intervalos no processo de estimação de parâmetros, elucidando que os algoritmos adotados também permitem a obtenção de modelos razoáveis em cenários reais.Biblioteca Digitais de Teses e Dissertações da USPGarcia, ClaudioSanto, Giulio Cesare Mastrocinque2020-12-07info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/3/3139/tde-05032021-111034/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2021-04-28T16:16:03Zoai:teses.usp.br:tde-05032021-111034Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212021-04-28T16:16:03Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification. Técnicas de mineração de dados aplicadas a dados históricos de processos industriais como uma ferramenta para encontrar intervalos temporais adequados à identificação de sistemas.
title	Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.
spellingShingle	Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification. Santo, Giulio Cesare Mastrocinque Ciência de dados Condition number Dados históricos Data mining Data quality Data Science Data segmentation Effective rank Historical data Identificação de sistemas Mineração de dados Multivariable systems Número de condicionamento Posto efetivo Qualidade de dados Segmentação de dados Sistemas multivariáveis System identification
title_short	Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.
title_full	Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.
title_fullStr	Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.
title_full_unstemmed	Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.
title_sort	Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.
author	Santo, Giulio Cesare Mastrocinque
author_facet	Santo, Giulio Cesare Mastrocinque
author_role	author
dc.contributor.none.fl_str_mv	Garcia, Claudio
dc.contributor.author.fl_str_mv	Santo, Giulio Cesare Mastrocinque
dc.subject.por.fl_str_mv	Ciência de dados Condition number Dados históricos Data mining Data quality Data Science Data segmentation Effective rank Historical data Identificação de sistemas Mineração de dados Multivariable systems Número de condicionamento Posto efetivo Qualidade de dados Segmentação de dados Sistemas multivariáveis System identification
topic	Ciência de dados Condition number Dados históricos Data mining Data quality Data Science Data segmentation Effective rank Historical data Identificação de sistemas Mineração de dados Multivariable systems Número de condicionamento Posto efetivo Qualidade de dados Segmentação de dados Sistemas multivariáveis System identification
description	System Identification is a set of model estimation techniques traditionally used by in- dustries to improve and optimize their processes. Estimating dynamic process models requires the existence of informative and representative data of the system, which are usually generated through physical experiments on the plants. However, such procedures often need to be performed multiple times to produce adequate datasets, which may result in products that are out of specification. On the other hand, the emergence of powerful data storage and management software, as well as the constant development in the areas of mining and data science represent a potential paradigm break in industry, in which robust data-driven solutions can be adopted. The direct use of historical data to extract useful information from industrial processes is a central part of this work, in which it is proposed a comparison of data mining techniques with the objective of finding time intervals with sucient information to perform system identification. For this purpose, a detailed review on the literature regarding the problem is initially provided. Then, dierent mining algorithms are applied to both Single-Input Single-Output and Multiple-Input Multiple-Output systems operating in open-loop and in closed-loop. Simulated data is used to didactically exemplify how each method works and to validate the expected outcomes in an ideal scenario. Regressive models are then estimated with the obtained intervals, which are used to perform cross-validation. Finally, the proposed methods are applied to real multivariable data coming from an industrial petrochemical furnace. Results obtained through simulated data show that the proposed data mining strategies allowed the estimation of good models in cross-validation scenarios with 1, 10, 100 and infinite prediction steps. Real data applications, in turn, revealed to be challenging due to the noisy nature of the data and due to the scarcity of historical intervals in which all the inputs of the multivariable system are suciently active to estimate a model. However, this problem is overcome through the use of multiple intervals in the estimation process, elucidating that the adopted algorithms can also produce reasonable models in real scenarios.
publishDate	2020
dc.date.none.fl_str_mv	2020-12-07
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/3/3139/tde-05032021-111034/
url	https://www.teses.usp.br/teses/disponiveis/3/3139/tde-05032021-111034/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1809090401002323968

Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.

Registros relacionados