MaSTA: a text-based machine learning approach for systems-of-systems in the big data context

Detalhes bibliográficos
Autor(a) principal: Bianchi, Thiago
Data de Publicação: 2019
Tipo de documento: Tese
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: http://www.teses.usp.br/teses/disponiveis/55/55134/tde-11092019-144236/
Resumo: Systems-of-systems (SoS) have gained a very important status in industry and academia as an answer to the growing complexity of software-intensive systems. SoS are particular in the sense that their capabilities transcend the mere sum of the capacities of their diverse independent constituents. In parallel, the current growth in the amount of data collected in different formats is impressive and imposes a considerable challenge for researchers and professionals, characterizing hence the Big Data context. In this scenario, Machine Learning techniques have been increasingly explored to analyze and extract relevant knowledge from such data. SoS have also generated a large amount of data and text information and, in many situations, users of SoS need to manually register unstructured, critical texts, e.g., work orders and service requests, and also need to map them to structured information. Besides that, these are repetitive, time-/effort-consuming, and even error-prone tasks. The main objective of this Thesis is to present MaSTA, an approach composed of an innovative classification method to infer classifiers from large textual collections and an evaluation method that measures the reliability and performance levels of such classifiers. To evaluate the effectiveness of MaSTA, we conducted an experiment with a commercial SoS used by large companies that provided us four datasets containing near one million records related with three classification tasks. As a result, this experiment indicated that MaSTA is capable of automatically classifying the documents and also improve the user assertiveness by reducing the list of possible classifications. Moreover, this experiment indicated that MaSTA is a scalable solution for the Big Data scenarios in which document collections have hundreds of thousands (even millions) of documents, even produced by different constituents of an SoS.
id USP_2755e11e650c2dba9278d5c3851393c2
oai_identifier_str oai:teses.usp.br:tde-11092019-144236
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling MaSTA: a text-based machine learning approach for systems-of-systems in the big data contextMaSTA: uma abordagem de aprendizado de máquina orientado a textos para sistemas-de-sistemas no contexto de big dataAprendizado de máquinaBig DataBig DataClassificação de textoMachine learningNaive BayesNaive BayesSistema-de-sistemasSystem-of-systemsText classificationSystems-of-systems (SoS) have gained a very important status in industry and academia as an answer to the growing complexity of software-intensive systems. SoS are particular in the sense that their capabilities transcend the mere sum of the capacities of their diverse independent constituents. In parallel, the current growth in the amount of data collected in different formats is impressive and imposes a considerable challenge for researchers and professionals, characterizing hence the Big Data context. In this scenario, Machine Learning techniques have been increasingly explored to analyze and extract relevant knowledge from such data. SoS have also generated a large amount of data and text information and, in many situations, users of SoS need to manually register unstructured, critical texts, e.g., work orders and service requests, and also need to map them to structured information. Besides that, these are repetitive, time-/effort-consuming, and even error-prone tasks. The main objective of this Thesis is to present MaSTA, an approach composed of an innovative classification method to infer classifiers from large textual collections and an evaluation method that measures the reliability and performance levels of such classifiers. To evaluate the effectiveness of MaSTA, we conducted an experiment with a commercial SoS used by large companies that provided us four datasets containing near one million records related with three classification tasks. As a result, this experiment indicated that MaSTA is capable of automatically classifying the documents and also improve the user assertiveness by reducing the list of possible classifications. Moreover, this experiment indicated that MaSTA is a scalable solution for the Big Data scenarios in which document collections have hundreds of thousands (even millions) of documents, even produced by different constituents of an SoS.Sistemas-de-sistemas (SoS) conquistaram um status muito importante na indústria e na academia como uma resposta à crescente complexidade dos sistemas intensivos de software. SoS são particulares no sentido de que suas capacidades transcendem a mera soma das capacidades de seus diversos constituintes independentes. Paralelamente, o crescimento atual na quantidade de dados coletados em diferentes formatos é impressionante e impõe um desafio considerável para pesquisadores e profissionais, caracterizando consequentemente o contexto de Big Data. Nesse cenário, técnicas de Aprendizado de Máquina têm sido cada vez mais exploradas para analisar e extrair conhecimento relevante de tais dados. SoS também têm gerado uma grande quantidade de dados e informações de texto e, em muitas situações, os usuários do SoS precisam registrar manualmente textos críticos não estruturados, por exemplo, ordens de serviço e solicitações de serviço, e também precisam mapeá-los para informações estruturadas. Além disso, essas tarefas são repetitivas, demoradas, e até mesmo propensas a erros. O principal objetivo desta Tese é apresentar o MaSTA, uma abordagem composta por um método de classificação inovador para inferir classificadores a partir de grandes coleções de texto e um método de avaliação que mensura os níveis de confiabilidade e desempenho desses classificadores. Para avaliar a eficácia do MaSTA, nós conduzimos um experimento com um SoS comercial utilizado por grandes empresas que nos forneceram quatro conjuntos de dados contendo quase um milhão de registros relacionados com três tarefas de classificação. Como resultado, esse experimento indicou que o MaSTA é capaz de classificar automaticamente os documentos e também melhorar a assertividade do usuário através da redução da lista de possíveis classificações. Além disso, esse experimento indicou que o MaSTA é uma solução escalável para os cenários de Big Data, nos quais as coleções de documentos têm centenas de milhares (até milhões) de documentos, até mesmo produzidos por diferentes constituintes de um SoS.Biblioteca Digitais de Teses e Dissertações da USPNakagawa, Elisa YumiBianchi, Thiago2019-04-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://www.teses.usp.br/teses/disponiveis/55/55134/tde-11092019-144236/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2019-11-08T21:26:53Zoai:teses.usp.br:tde-11092019-144236Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212019-11-08T21:26:53Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv MaSTA: a text-based machine learning approach for systems-of-systems in the big data context
MaSTA: uma abordagem de aprendizado de máquina orientado a textos para sistemas-de-sistemas no contexto de big data
title MaSTA: a text-based machine learning approach for systems-of-systems in the big data context
spellingShingle MaSTA: a text-based machine learning approach for systems-of-systems in the big data context
Bianchi, Thiago
Aprendizado de máquina
Big Data
Big Data
Classificação de texto
Machine learning
Naive Bayes
Naive Bayes
Sistema-de-sistemas
System-of-systems
Text classification
title_short MaSTA: a text-based machine learning approach for systems-of-systems in the big data context
title_full MaSTA: a text-based machine learning approach for systems-of-systems in the big data context
title_fullStr MaSTA: a text-based machine learning approach for systems-of-systems in the big data context
title_full_unstemmed MaSTA: a text-based machine learning approach for systems-of-systems in the big data context
title_sort MaSTA: a text-based machine learning approach for systems-of-systems in the big data context
author Bianchi, Thiago
author_facet Bianchi, Thiago
author_role author
dc.contributor.none.fl_str_mv Nakagawa, Elisa Yumi
dc.contributor.author.fl_str_mv Bianchi, Thiago
dc.subject.por.fl_str_mv Aprendizado de máquina
Big Data
Big Data
Classificação de texto
Machine learning
Naive Bayes
Naive Bayes
Sistema-de-sistemas
System-of-systems
Text classification
topic Aprendizado de máquina
Big Data
Big Data
Classificação de texto
Machine learning
Naive Bayes
Naive Bayes
Sistema-de-sistemas
System-of-systems
Text classification
description Systems-of-systems (SoS) have gained a very important status in industry and academia as an answer to the growing complexity of software-intensive systems. SoS are particular in the sense that their capabilities transcend the mere sum of the capacities of their diverse independent constituents. In parallel, the current growth in the amount of data collected in different formats is impressive and imposes a considerable challenge for researchers and professionals, characterizing hence the Big Data context. In this scenario, Machine Learning techniques have been increasingly explored to analyze and extract relevant knowledge from such data. SoS have also generated a large amount of data and text information and, in many situations, users of SoS need to manually register unstructured, critical texts, e.g., work orders and service requests, and also need to map them to structured information. Besides that, these are repetitive, time-/effort-consuming, and even error-prone tasks. The main objective of this Thesis is to present MaSTA, an approach composed of an innovative classification method to infer classifiers from large textual collections and an evaluation method that measures the reliability and performance levels of such classifiers. To evaluate the effectiveness of MaSTA, we conducted an experiment with a commercial SoS used by large companies that provided us four datasets containing near one million records related with three classification tasks. As a result, this experiment indicated that MaSTA is capable of automatically classifying the documents and also improve the user assertiveness by reducing the list of possible classifications. Moreover, this experiment indicated that MaSTA is a scalable solution for the Big Data scenarios in which document collections have hundreds of thousands (even millions) of documents, even produced by different constituents of an SoS.
publishDate 2019
dc.date.none.fl_str_mv 2019-04-11
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://www.teses.usp.br/teses/disponiveis/55/55134/tde-11092019-144236/
url http://www.teses.usp.br/teses/disponiveis/55/55134/tde-11092019-144236/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1809090626938994688