Explaing portuguese's public administration absenteeism through data mining
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10071/18298 |
Resumo: | Portuguese Public Administration (PPA) is the largest contractor in the country, with 12.8% of the Portugal’s active people working for it. Absenteeism and productivity are mutually connected. Thus, companies from public and private sector should always have it in mind, to prevent flaws in the processes and profit loss. Effectively, the main goal of this study is to understand PPA’s absenteeism, particularly the duration of the worker’s next absence, what leads to it, as well as explaining it, by creating a data mining model that fits the problem. To study PPA’s absenteeism it was collected data from a Human Capital Management (HCM) system, by extracting the annual absenteeism report, for 2016, and queries to the worker’s profile, absenteeism history and job characteristics, resulting in around 59,000 different absence records. Data mining techniques were used to clean the dataset and Recency, Frequency and Monetary (RFM) value methodology to add new variables to the problematic, originating richer information about the worker and the absence itself. Thereafter, the Support Vector Machines (SVM) algorithm was applied for modeling the absence duration in day and a 10-fold cross-validation scheme was adopted to assess and confirm the model’s robustness. Finally, major findings were revealed by this study as features related to the worker’s profile are less relevant than absence related features; the influence of the RFM methodology in this study, which managed to get all its computed variables in the 25th most important features; and the discovery of the most concerning employee profile. |
id |
RCAP_d4bec03016933996d6696a04a93f45a4 |
---|---|
oai_identifier_str |
oai:repositorio.iscte-iul.pt:10071/18298 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Explaing portuguese's public administration absenteeism through data miningAbsenteeismPortugal public administrationData miningRFMInformática aplicada à gestãoGestão de recursos humanosAdministração públicaAbsentismoAlgoritmoAnálise de dadosPortuguese Public Administration (PPA) is the largest contractor in the country, with 12.8% of the Portugal’s active people working for it. Absenteeism and productivity are mutually connected. Thus, companies from public and private sector should always have it in mind, to prevent flaws in the processes and profit loss. Effectively, the main goal of this study is to understand PPA’s absenteeism, particularly the duration of the worker’s next absence, what leads to it, as well as explaining it, by creating a data mining model that fits the problem. To study PPA’s absenteeism it was collected data from a Human Capital Management (HCM) system, by extracting the annual absenteeism report, for 2016, and queries to the worker’s profile, absenteeism history and job characteristics, resulting in around 59,000 different absence records. Data mining techniques were used to clean the dataset and Recency, Frequency and Monetary (RFM) value methodology to add new variables to the problematic, originating richer information about the worker and the absence itself. Thereafter, the Support Vector Machines (SVM) algorithm was applied for modeling the absence duration in day and a 10-fold cross-validation scheme was adopted to assess and confirm the model’s robustness. Finally, major findings were revealed by this study as features related to the worker’s profile are less relevant than absence related features; the influence of the RFM methodology in this study, which managed to get all its computed variables in the 25th most important features; and the discovery of the most concerning employee profile.A Administração Pública Portuguesa (APP) é o maior contratante do país, englobando 12.8% da população ativa. O absentismo e a produtividade estão mutuamente ligados, logo tanto as empresas dos vários setores devem tê-las em atenção para prevenir falhas nos processos e perda de lucro. Efetivamente, o principal propósito deste estudo é perceber o absentismo na APP, em especial a duração da próxima ausência de um trabalhador, as suas causas e explicá-la, através da criação de um modelo adequado ao problema. Para modelar o absentismo na APP recolheram-se dados de um sistema de gestão de recursos humanos, extraindo o relatório anual de absentismo, para 2016, e dados do perfil do trabalhador, histórico de absentismo e especificações do contrato, resultando em cerca de 59,000 ausências. Por sua vez, foram usadas técnicas de data mining para limpar o conjunto de dados e a metodologia Recency, Frequency and Monetary value (RFM) para adicionar novas variáveis à problemática e obter mais perspetivas sobre o trabalhador e a ausência. De seguida, foi aplicado o algoritmo Support Vector Machines (SVM) para modelar a duração da ausência em dias e um esquema de validação cruzada com 10 folds, que testou e aprovou a robustez do modelo. Por fim, este estudo revelou várias descobertas como: variáveis relacionadas com o perfil do trabalhador são menos relevantes que as relacionadas com a ausência em si; a influência da metodologia RFM neste estudo, que conseguiu ter todas as suas variáveis nas mais importantes; e a descoberta do perfil do trabalhador mais preocupante.2019-07-01T12:05:15Z2018-12-05T00:00:00Z2018-12-052018-08info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfapplication/octet-streamhttp://hdl.handle.net/10071/18298TID:202163920engCosta, Leandro Miguel Bartolomeu da Cruzinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:42:48Zoai:repositorio.iscte-iul.pt:10071/18298Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:20:04.772525Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Explaing portuguese's public administration absenteeism through data mining |
title |
Explaing portuguese's public administration absenteeism through data mining |
spellingShingle |
Explaing portuguese's public administration absenteeism through data mining Costa, Leandro Miguel Bartolomeu da Cruz Absenteeism Portugal public administration Data mining RFM Informática aplicada à gestão Gestão de recursos humanos Administração pública Absentismo Algoritmo Análise de dados |
title_short |
Explaing portuguese's public administration absenteeism through data mining |
title_full |
Explaing portuguese's public administration absenteeism through data mining |
title_fullStr |
Explaing portuguese's public administration absenteeism through data mining |
title_full_unstemmed |
Explaing portuguese's public administration absenteeism through data mining |
title_sort |
Explaing portuguese's public administration absenteeism through data mining |
author |
Costa, Leandro Miguel Bartolomeu da Cruz |
author_facet |
Costa, Leandro Miguel Bartolomeu da Cruz |
author_role |
author |
dc.contributor.author.fl_str_mv |
Costa, Leandro Miguel Bartolomeu da Cruz |
dc.subject.por.fl_str_mv |
Absenteeism Portugal public administration Data mining RFM Informática aplicada à gestão Gestão de recursos humanos Administração pública Absentismo Algoritmo Análise de dados |
topic |
Absenteeism Portugal public administration Data mining RFM Informática aplicada à gestão Gestão de recursos humanos Administração pública Absentismo Algoritmo Análise de dados |
description |
Portuguese Public Administration (PPA) is the largest contractor in the country, with 12.8% of the Portugal’s active people working for it. Absenteeism and productivity are mutually connected. Thus, companies from public and private sector should always have it in mind, to prevent flaws in the processes and profit loss. Effectively, the main goal of this study is to understand PPA’s absenteeism, particularly the duration of the worker’s next absence, what leads to it, as well as explaining it, by creating a data mining model that fits the problem. To study PPA’s absenteeism it was collected data from a Human Capital Management (HCM) system, by extracting the annual absenteeism report, for 2016, and queries to the worker’s profile, absenteeism history and job characteristics, resulting in around 59,000 different absence records. Data mining techniques were used to clean the dataset and Recency, Frequency and Monetary (RFM) value methodology to add new variables to the problematic, originating richer information about the worker and the absence itself. Thereafter, the Support Vector Machines (SVM) algorithm was applied for modeling the absence duration in day and a 10-fold cross-validation scheme was adopted to assess and confirm the model’s robustness. Finally, major findings were revealed by this study as features related to the worker’s profile are less relevant than absence related features; the influence of the RFM methodology in this study, which managed to get all its computed variables in the 25th most important features; and the discovery of the most concerning employee profile. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-12-05T00:00:00Z 2018-12-05 2018-08 2019-07-01T12:05:15Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10071/18298 TID:202163920 |
url |
http://hdl.handle.net/10071/18298 |
identifier_str_mv |
TID:202163920 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf application/octet-stream |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799134761207201792 |