Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches
Autor(a) principal: | |
---|---|
Data de Publicação: | 2024 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/164774 |
Resumo: | Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics |
id |
RCAP_83f914966f716d6085390ed6d0c8a913 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/164774 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approachesDocument ClassificationTechnical LanguageSupervised Machine LearningUnsupervised Machine LearningSemi-Supervised Machine LearningSDG 9 - Industry, innovation and infrastructureDomínio/Área Científica::Ciências Naturais::Ciências da Computação e da InformaçãoDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsThis study investigates automatic document classification in technical logbooks, introducing the innovative Labeling Functions by Dependency methodology. Departing from conventional weakly supervised approaches, this method focuses on unveiling linguistic patterns and dependencies within textual data, employing chi-squared tests for statistical validation. Alongside unsupervised and supervised approaches, Labeling Functions by Dependency demonstrated notable efficacy, highlighting the importance of thorough NLP preprocessing. The study contributes insights intoweakly supervised learning, emphasizing the pivotal role of linguistic dependencies and preprocessing in achieving accurate document classification. The novel approach opens avenues for advancements in machine learning methodologies tailored to unstructured textual data.Henriques, Roberto André PereiraRUNSchmidt, Robin Karl2024-03-12T17:24:14Z2024-02-012024-02-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/164774TID:203544110enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-18T01:45:13Zoai:run.unl.pt:10362/164774Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T04:01:59.817717Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches |
title |
Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches |
spellingShingle |
Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches Schmidt, Robin Karl Document Classification Technical Language Supervised Machine Learning Unsupervised Machine Learning Semi-Supervised Machine Learning SDG 9 - Industry, innovation and infrastructure Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação |
title_short |
Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches |
title_full |
Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches |
title_fullStr |
Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches |
title_full_unstemmed |
Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches |
title_sort |
Automatic document classification in technical logbooks: A comparative study of supervised, weakly supervised and unsupervised machine learning approaches |
author |
Schmidt, Robin Karl |
author_facet |
Schmidt, Robin Karl |
author_role |
author |
dc.contributor.none.fl_str_mv |
Henriques, Roberto André Pereira RUN |
dc.contributor.author.fl_str_mv |
Schmidt, Robin Karl |
dc.subject.por.fl_str_mv |
Document Classification Technical Language Supervised Machine Learning Unsupervised Machine Learning Semi-Supervised Machine Learning SDG 9 - Industry, innovation and infrastructure Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação |
topic |
Document Classification Technical Language Supervised Machine Learning Unsupervised Machine Learning Semi-Supervised Machine Learning SDG 9 - Industry, innovation and infrastructure Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação |
description |
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-03-12T17:24:14Z 2024-02-01 2024-02-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/164774 TID:203544110 |
url |
http://hdl.handle.net/10362/164774 |
identifier_str_mv |
TID:203544110 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138192594567168 |