Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns

Petrovici, Mihai Daniel

Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns

Detalhes bibliográficos
Autor(a) principal:	Petrovici, Mihai Daniel
Data de Publicação:	2020
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10400.1/15204
Resumo:	Cardiovascular Diseases (CVD) present the highest world health rate, constituting a risk factor to patients with diabetes and simultaneously a consequence of dyslipidemia. E ective lipid management of patients with diabetes is still largely unattained, requiring better perception of both patients and healthcare professionals. Aiming at better understanding the in uence of clinical parameters on Low Density Lipoprotein (LDL)-cholesterol patterns of type 2 diabetes uncontrolled patients, the Electronic Health Records (EHR) provided by APDP (Associa c~ao Protetora de Diabetes Portugal) have been subject to data mining techniques. The database content was primarily analyzed to understand data integrity and to avoid usage of EHR's corrupted values or misleading information. The statistical distribution of each clinical parameter reported in the data base took place to identify their individual behavior and to enable statistically coherent identi cation of the cohort to be used when modeling LDL. As a rst approach, LDL linear modeling was considered, using both ordinary leastsquares and stepwise approaches. Then, LDL non-linear modeling was tested, using the same populations employed on linear modeling to assess the most accurate and practical LDL model. The provided EHR included 32577 medical appointments held by 1767 patients between January 2008 and February 2018. More than 10 clinical features were studied, leading to the decision of limiting the case-study population to those patients who had at least 5 Medical Appointments (MA) during the decade. From all MA's, 32% and 63% reported LDL and Glycated Hemoglobin (HbA1c) measurements, respectively, but some MA's did not report both simultaneously. Six linear models, relating di erent sets of 6 clinical parameters were tested. The linear model 3, involving LDL, Total Cholesterol, HDL, Triglyceride, HbA1c and Platelet is the elected linear model with a Root Mean Square Error (RMSE) of 0.07. The model where Platelets are substituted by Proteinuria presents a RMSE of just 0.054 but employed solely 38 case-studies. Neural network-based modeling strategies were tested as an alternative to linear models. In this sense, the Multi-Objective Genetic Algorithm (MOGA) was used. After data preprocessing, MOGA was performed twice using di erent threshold values. Six models were developed considering di erent combinations of clinical parameters. For each model, the population was divided into 3 groups: 60% of the population was used to train the network, 20% to test the model and the remaining 20% to validate the model. Using the populations employed by each MOGA run, the stepwise algorithm was used to identify the relevance of each clinical parameter in the model and create another linear model using this parameter set. The MOGA model with the best training performance was Model 4, while model 2 was the one performing best in validation with RMSE of 0.057. However, linear model 5 created using the parameter selection identi ed by the MOGA presented a RMSE of 0.054 during validation when total cholesterol, HDL, triglyceride, HbA1c, microalbuminuria, creatinine, MDRD, sex and age are used in the composition of the LDL linear model. Therefore, we can conclude that LDL can be modeled by a linear model using 6 or 10 clinical variables with very low mean square error.

Metadados do item

id	RCAP_0b630143c2ef9a91bc694416238a3485
oai_identifier_str	oai:sapientia.ualg.pt:10400.1/15204
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patternsLipoproteina de baixa densidadeDiabetesData miningModeloDomínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e TecnologiasCardiovascular Diseases (CVD) present the highest world health rate, constituting a risk factor to patients with diabetes and simultaneously a consequence of dyslipidemia. E ective lipid management of patients with diabetes is still largely unattained, requiring better perception of both patients and healthcare professionals. Aiming at better understanding the in uence of clinical parameters on Low Density Lipoprotein (LDL)-cholesterol patterns of type 2 diabetes uncontrolled patients, the Electronic Health Records (EHR) provided by APDP (Associa c~ao Protetora de Diabetes Portugal) have been subject to data mining techniques. The database content was primarily analyzed to understand data integrity and to avoid usage of EHR's corrupted values or misleading information. The statistical distribution of each clinical parameter reported in the data base took place to identify their individual behavior and to enable statistically coherent identi cation of the cohort to be used when modeling LDL. As a rst approach, LDL linear modeling was considered, using both ordinary leastsquares and stepwise approaches. Then, LDL non-linear modeling was tested, using the same populations employed on linear modeling to assess the most accurate and practical LDL model. The provided EHR included 32577 medical appointments held by 1767 patients between January 2008 and February 2018. More than 10 clinical features were studied, leading to the decision of limiting the case-study population to those patients who had at least 5 Medical Appointments (MA) during the decade. From all MA's, 32% and 63% reported LDL and Glycated Hemoglobin (HbA1c) measurements, respectively, but some MA's did not report both simultaneously. Six linear models, relating di erent sets of 6 clinical parameters were tested. The linear model 3, involving LDL, Total Cholesterol, HDL, Triglyceride, HbA1c and Platelet is the elected linear model with a Root Mean Square Error (RMSE) of 0.07. The model where Platelets are substituted by Proteinuria presents a RMSE of just 0.054 but employed solely 38 case-studies. Neural network-based modeling strategies were tested as an alternative to linear models. In this sense, the Multi-Objective Genetic Algorithm (MOGA) was used. After data preprocessing, MOGA was performed twice using di erent threshold values. Six models were developed considering di erent combinations of clinical parameters. For each model, the population was divided into 3 groups: 60% of the population was used to train the network, 20% to test the model and the remaining 20% to validate the model. Using the populations employed by each MOGA run, the stepwise algorithm was used to identify the relevance of each clinical parameter in the model and create another linear model using this parameter set. The MOGA model with the best training performance was Model 4, while model 2 was the one performing best in validation with RMSE of 0.057. However, linear model 5 created using the parameter selection identi ed by the MOGA presented a RMSE of 0.054 during validation when total cholesterol, HDL, triglyceride, HbA1c, microalbuminuria, creatinine, MDRD, sex and age are used in the composition of the LDL linear model. Therefore, we can conclude that LDL can be modeled by a linear model using 6 or 10 clinical variables with very low mean square error.As doenças cardiovasculares (CVD) continuam a ser a maior causa de morte no mundo e constituem um fator de risco para diabéticos para além de os diabéticos terem maior propensão para desenvolver CVD. No entanto, apesar de as diretrizes recentes cobrirem o risco de CVD, o efetivo controlo lipídico está longe de ser conseguido. Além disso, a autogestão lipídica em conjunto com o gerenciamento de decisões terapêuticas, nem sempre assume a prioridade adequada quer pelos pacientes quer pelos profissionais de saúde. Pretendendo compreender melhor a influência dos parâmetros clínicos no colesterol de lipoproteínas de baixa densidade (LDL) de doentes diabéticos tipo 2, doentes estes cujo gerenciamento dos valores lipídicos se suspeitam inst aveis, recorreu-se a registos eletrónicos de saúde (EHR) providenciados pela APDP (Associação Protetora de Diabetes Portugal) para fazer um estudo baseado em técnicas de mineração de dados.(…)Ruano, M. GraçaRibeiro, Rogério José TavaresSapientiaPetrovici, Mihai Daniel2021-03-09T14:59:21Z2020-06-262020-06-26T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.1/15204TID:202663418enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-24T10:27:37Zoai:sapientia.ualg.pt:10400.1/15204Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:06:05.204128Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns
title	Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns
spellingShingle	Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns Petrovici, Mihai Daniel Lipoproteina de baixa densidade Diabetes Data mining Modelo Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias
title_short	Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns
title_full	Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns
title_fullStr	Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns
title_full_unstemmed	Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns
title_sort	Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns
author	Petrovici, Mihai Daniel
author_facet	Petrovici, Mihai Daniel
author_role	author
dc.contributor.none.fl_str_mv	Ruano, M. Graça Ribeiro, Rogério José Tavares Sapientia
dc.contributor.author.fl_str_mv	Petrovici, Mihai Daniel
dc.subject.por.fl_str_mv	Lipoproteina de baixa densidade Diabetes Data mining Modelo Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias
topic	Lipoproteina de baixa densidade Diabetes Data mining Modelo Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias
description	Cardiovascular Diseases (CVD) present the highest world health rate, constituting a risk factor to patients with diabetes and simultaneously a consequence of dyslipidemia. E ective lipid management of patients with diabetes is still largely unattained, requiring better perception of both patients and healthcare professionals. Aiming at better understanding the in uence of clinical parameters on Low Density Lipoprotein (LDL)-cholesterol patterns of type 2 diabetes uncontrolled patients, the Electronic Health Records (EHR) provided by APDP (Associa c~ao Protetora de Diabetes Portugal) have been subject to data mining techniques. The database content was primarily analyzed to understand data integrity and to avoid usage of EHR's corrupted values or misleading information. The statistical distribution of each clinical parameter reported in the data base took place to identify their individual behavior and to enable statistically coherent identi cation of the cohort to be used when modeling LDL. As a rst approach, LDL linear modeling was considered, using both ordinary leastsquares and stepwise approaches. Then, LDL non-linear modeling was tested, using the same populations employed on linear modeling to assess the most accurate and practical LDL model. The provided EHR included 32577 medical appointments held by 1767 patients between January 2008 and February 2018. More than 10 clinical features were studied, leading to the decision of limiting the case-study population to those patients who had at least 5 Medical Appointments (MA) during the decade. From all MA's, 32% and 63% reported LDL and Glycated Hemoglobin (HbA1c) measurements, respectively, but some MA's did not report both simultaneously. Six linear models, relating di erent sets of 6 clinical parameters were tested. The linear model 3, involving LDL, Total Cholesterol, HDL, Triglyceride, HbA1c and Platelet is the elected linear model with a Root Mean Square Error (RMSE) of 0.07. The model where Platelets are substituted by Proteinuria presents a RMSE of just 0.054 but employed solely 38 case-studies. Neural network-based modeling strategies were tested as an alternative to linear models. In this sense, the Multi-Objective Genetic Algorithm (MOGA) was used. After data preprocessing, MOGA was performed twice using di erent threshold values. Six models were developed considering di erent combinations of clinical parameters. For each model, the population was divided into 3 groups: 60% of the population was used to train the network, 20% to test the model and the remaining 20% to validate the model. Using the populations employed by each MOGA run, the stepwise algorithm was used to identify the relevance of each clinical parameter in the model and create another linear model using this parameter set. The MOGA model with the best training performance was Model 4, while model 2 was the one performing best in validation with RMSE of 0.057. However, linear model 5 created using the parameter selection identi ed by the MOGA presented a RMSE of 0.054 during validation when total cholesterol, HDL, triglyceride, HbA1c, microalbuminuria, creatinine, MDRD, sex and age are used in the composition of the LDL linear model. Therefore, we can conclude that LDL can be modeled by a linear model using 6 or 10 clinical variables with very low mean square error.
publishDate	2020
dc.date.none.fl_str_mv	2020-06-26 2020-06-26T00:00:00Z 2021-03-09T14:59:21Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.1/15204 TID:202663418
url	http://hdl.handle.net/10400.1/15204
identifier_str_mv	TID:202663418
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799133301705801728

Data mining electronic health records of type 2 diabetes uncontrolled patients towards clustering LDL-cholesterol patterns

Registros relacionados