Explaing portuguese's public administration absenteeism through data mining

Detalhes bibliográficos
Autor(a) principal: Costa, Leandro Miguel Bartolomeu da Cruz
Data de Publicação: 2018
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/18298
Resumo: Portuguese Public Administration (PPA) is the largest contractor in the country, with 12.8% of the Portugal’s active people working for it. Absenteeism and productivity are mutually connected. Thus, companies from public and private sector should always have it in mind, to prevent flaws in the processes and profit loss. Effectively, the main goal of this study is to understand PPA’s absenteeism, particularly the duration of the worker’s next absence, what leads to it, as well as explaining it, by creating a data mining model that fits the problem. To study PPA’s absenteeism it was collected data from a Human Capital Management (HCM) system, by extracting the annual absenteeism report, for 2016, and queries to the worker’s profile, absenteeism history and job characteristics, resulting in around 59,000 different absence records. Data mining techniques were used to clean the dataset and Recency, Frequency and Monetary (RFM) value methodology to add new variables to the problematic, originating richer information about the worker and the absence itself. Thereafter, the Support Vector Machines (SVM) algorithm was applied for modeling the absence duration in day and a 10-fold cross-validation scheme was adopted to assess and confirm the model’s robustness. Finally, major findings were revealed by this study as features related to the worker’s profile are less relevant than absence related features; the influence of the RFM methodology in this study, which managed to get all its computed variables in the 25th most important features; and the discovery of the most concerning employee profile.
id RCAP_d4bec03016933996d6696a04a93f45a4
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/18298
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Explaing portuguese's public administration absenteeism through data miningAbsenteeismPortugal public administrationData miningRFMInformática aplicada à gestãoGestão de recursos humanosAdministração públicaAbsentismoAlgoritmoAnálise de dadosPortuguese Public Administration (PPA) is the largest contractor in the country, with 12.8% of the Portugal’s active people working for it. Absenteeism and productivity are mutually connected. Thus, companies from public and private sector should always have it in mind, to prevent flaws in the processes and profit loss. Effectively, the main goal of this study is to understand PPA’s absenteeism, particularly the duration of the worker’s next absence, what leads to it, as well as explaining it, by creating a data mining model that fits the problem. To study PPA’s absenteeism it was collected data from a Human Capital Management (HCM) system, by extracting the annual absenteeism report, for 2016, and queries to the worker’s profile, absenteeism history and job characteristics, resulting in around 59,000 different absence records. Data mining techniques were used to clean the dataset and Recency, Frequency and Monetary (RFM) value methodology to add new variables to the problematic, originating richer information about the worker and the absence itself. Thereafter, the Support Vector Machines (SVM) algorithm was applied for modeling the absence duration in day and a 10-fold cross-validation scheme was adopted to assess and confirm the model’s robustness. Finally, major findings were revealed by this study as features related to the worker’s profile are less relevant than absence related features; the influence of the RFM methodology in this study, which managed to get all its computed variables in the 25th most important features; and the discovery of the most concerning employee profile.A Administração Pública Portuguesa (APP) é o maior contratante do país, englobando 12.8% da população ativa. O absentismo e a produtividade estão mutuamente ligados, logo tanto as empresas dos vários setores devem tê-las em atenção para prevenir falhas nos processos e perda de lucro. Efetivamente, o principal propósito deste estudo é perceber o absentismo na APP, em especial a duração da próxima ausência de um trabalhador, as suas causas e explicá-la, através da criação de um modelo adequado ao problema. Para modelar o absentismo na APP recolheram-se dados de um sistema de gestão de recursos humanos, extraindo o relatório anual de absentismo, para 2016, e dados do perfil do trabalhador, histórico de absentismo e especificações do contrato, resultando em cerca de 59,000 ausências. Por sua vez, foram usadas técnicas de data mining para limpar o conjunto de dados e a metodologia Recency, Frequency and Monetary value (RFM) para adicionar novas variáveis à problemática e obter mais perspetivas sobre o trabalhador e a ausência. De seguida, foi aplicado o algoritmo Support Vector Machines (SVM) para modelar a duração da ausência em dias e um esquema de validação cruzada com 10 folds, que testou e aprovou a robustez do modelo. Por fim, este estudo revelou várias descobertas como: variáveis relacionadas com o perfil do trabalhador são menos relevantes que as relacionadas com a ausência em si; a influência da metodologia RFM neste estudo, que conseguiu ter todas as suas variáveis nas mais importantes; e a descoberta do perfil do trabalhador mais preocupante.2019-07-01T12:05:15Z2018-12-05T00:00:00Z2018-12-052018-08info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfapplication/octet-streamhttp://hdl.handle.net/10071/18298TID:202163920engCosta, Leandro Miguel Bartolomeu da Cruzinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:42:48Zoai:repositorio.iscte-iul.pt:10071/18298Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:20:04.772525Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Explaing portuguese's public administration absenteeism through data mining
title Explaing portuguese's public administration absenteeism through data mining
spellingShingle Explaing portuguese's public administration absenteeism through data mining
Costa, Leandro Miguel Bartolomeu da Cruz
Absenteeism
Portugal public administration
Data mining
RFM
Informática aplicada à gestão
Gestão de recursos humanos
Administração pública
Absentismo
Algoritmo
Análise de dados
title_short Explaing portuguese's public administration absenteeism through data mining
title_full Explaing portuguese's public administration absenteeism through data mining
title_fullStr Explaing portuguese's public administration absenteeism through data mining
title_full_unstemmed Explaing portuguese's public administration absenteeism through data mining
title_sort Explaing portuguese's public administration absenteeism through data mining
author Costa, Leandro Miguel Bartolomeu da Cruz
author_facet Costa, Leandro Miguel Bartolomeu da Cruz
author_role author
dc.contributor.author.fl_str_mv Costa, Leandro Miguel Bartolomeu da Cruz
dc.subject.por.fl_str_mv Absenteeism
Portugal public administration
Data mining
RFM
Informática aplicada à gestão
Gestão de recursos humanos
Administração pública
Absentismo
Algoritmo
Análise de dados
topic Absenteeism
Portugal public administration
Data mining
RFM
Informática aplicada à gestão
Gestão de recursos humanos
Administração pública
Absentismo
Algoritmo
Análise de dados
description Portuguese Public Administration (PPA) is the largest contractor in the country, with 12.8% of the Portugal’s active people working for it. Absenteeism and productivity are mutually connected. Thus, companies from public and private sector should always have it in mind, to prevent flaws in the processes and profit loss. Effectively, the main goal of this study is to understand PPA’s absenteeism, particularly the duration of the worker’s next absence, what leads to it, as well as explaining it, by creating a data mining model that fits the problem. To study PPA’s absenteeism it was collected data from a Human Capital Management (HCM) system, by extracting the annual absenteeism report, for 2016, and queries to the worker’s profile, absenteeism history and job characteristics, resulting in around 59,000 different absence records. Data mining techniques were used to clean the dataset and Recency, Frequency and Monetary (RFM) value methodology to add new variables to the problematic, originating richer information about the worker and the absence itself. Thereafter, the Support Vector Machines (SVM) algorithm was applied for modeling the absence duration in day and a 10-fold cross-validation scheme was adopted to assess and confirm the model’s robustness. Finally, major findings were revealed by this study as features related to the worker’s profile are less relevant than absence related features; the influence of the RFM methodology in this study, which managed to get all its computed variables in the 25th most important features; and the discovery of the most concerning employee profile.
publishDate 2018
dc.date.none.fl_str_mv 2018-12-05T00:00:00Z
2018-12-05
2018-08
2019-07-01T12:05:15Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/18298
TID:202163920
url http://hdl.handle.net/10071/18298
identifier_str_mv TID:202163920
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
application/octet-stream
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134761207201792