Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches

Detalhes bibliográficos
Autor(a) principal: Schmidt, Robin Karl
Data de Publicação: 2024
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/164774
Resumo: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
id RCAP_83f914966f716d6085390ed6d0c8a913
oai_identifier_str oai:run.unl.pt:10362/164774
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approachesDocument ClassificationTechnical LanguageSupervised Machine LearningUnsupervised Machine LearningSemi-Supervised Machine LearningSDG 9 - Industry, innovation and infrastructureDomínio/Área Científica::Ciências Naturais::Ciências da Computação e da InformaçãoDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsThis study investigates automatic document classification in technical logbooks, introducing the innovative Labeling Functions by Dependency methodology. Departing from conventional weakly supervised approaches, this method focuses on unveiling linguistic patterns and dependencies within textual data, employing chi-squared tests for statistical validation. Alongside unsupervised and supervised approaches, Labeling Functions by Dependency demonstrated notable efficacy, highlighting the importance of thorough NLP preprocessing. The study contributes insights intoweakly supervised learning, emphasizing the pivotal role of linguistic dependencies and preprocessing in achieving accurate document classification. The novel approach opens avenues for advancements in machine learning methodologies tailored to unstructured textual data.Henriques, Roberto André PereiraRUNSchmidt, Robin Karl2024-03-12T17:24:14Z2024-02-012024-02-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/164774TID:203544110enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-18T01:45:13Zoai:run.unl.pt:10362/164774Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T04:01:59.817717Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches
title Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches
spellingShingle Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches
Schmidt, Robin Karl
Document Classification
Technical Language
Supervised Machine Learning
Unsupervised Machine Learning
Semi-Supervised Machine Learning
SDG 9 - Industry, innovation and infrastructure
Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação
title_short Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches
title_full Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches
title_fullStr Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches
title_full_unstemmed Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches
title_sort Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches
author Schmidt, Robin Karl
author_facet Schmidt, Robin Karl
author_role author
dc.contributor.none.fl_str_mv Henriques, Roberto André Pereira
RUN
dc.contributor.author.fl_str_mv Schmidt, Robin Karl
dc.subject.por.fl_str_mv Document Classification
Technical Language
Supervised Machine Learning
Unsupervised Machine Learning
Semi-Supervised Machine Learning
SDG 9 - Industry, innovation and infrastructure
Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação
topic Document Classification
Technical Language
Supervised Machine Learning
Unsupervised Machine Learning
Semi-Supervised Machine Learning
SDG 9 - Industry, innovation and infrastructure
Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação
description Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
publishDate 2024
dc.date.none.fl_str_mv 2024-03-12T17:24:14Z
2024-02-01
2024-02-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/164774
TID:203544110
url http://hdl.handle.net/10362/164774
identifier_str_mv TID:203544110
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138192594567168