The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/38016 |
Resumo: | ackground/Objective: Technological evolution is increasingly making real the elements necessary for the daily practice of personalized medicine, an improved vision of health care whose decisions regarding prognosis, diagnosis and therapeutic strategies depend on the patient's various characteristics. This approach leads to the collection and use of information that is broad in extension and complexity, for which dimensionality reduction techniques are imperative, in order to simplify and understand it. This paper aims to show the value of the ClustOfVar technique, a variable clustering technique capable of dealing with mixed data, resulting in data reduction. Through its hierarchical and non-hierarchical approaches, it replaces sample variables with representative synthetic variables. This dimensional reduction can be extended to individuals by applying Ward's method. Methods: The cleaning process of anthropometric, obstetric, vital signs and pubertal status data from 700 participants of the Generation XXI cohort and/or their mothers led to variables being removed (181 down to 105 variables, 82 quantitative and 23 qualitative). Then, the hierarchical technique of the ClustOfVar package was applied, which started by building a hierarchy of variables. The optimal number of clusters was then determined, considering the aggregation level plot and the bootstrap methodology, and each cluster was characterized. The partition into clusters was then tried with the non-hierarchical process. Once the partition was defined, Ward's method was applied, dividing the participants into clusters. We finished with their description according to the synthetic variables. Results: The partition in 8 clusters of variables suggested by the hierarchical technique was chosen, with the first and third cluster being filled mainly by maternal characteristics (relating mainly to menstruation and physical measurements, respectively). While cluster 2 mixes maternal and individual characteristics, cluster 4 contains only patient variables at birth. Cluster 5 is the most diverse, with anthropometric and related measurements of vital signs and blood macromolecules. Cluster 6 has total mass and fat measurements. Finally, cluster 7 is related to pubertal status variables, and cluster 8 includes cholesterol variables. The clustering of individuals results in the creation of specific profiles for each of the 8 clusters of individuals. Conclusions: The ClustOfVar technique accomplishes a data transformation relevant to the dispersion of personalized medicine. However, it lacks the ability to deal with high proportions of missing data and its bootstrap process is very time-consuming. |
id |
RCAP_a3168ce058bce79dc0a3a828db16d560 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/38016 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health databaseClustOfVarClusteringDimension reductionGeneration XXI cohortMixed dataackground/Objective: Technological evolution is increasingly making real the elements necessary for the daily practice of personalized medicine, an improved vision of health care whose decisions regarding prognosis, diagnosis and therapeutic strategies depend on the patient's various characteristics. This approach leads to the collection and use of information that is broad in extension and complexity, for which dimensionality reduction techniques are imperative, in order to simplify and understand it. This paper aims to show the value of the ClustOfVar technique, a variable clustering technique capable of dealing with mixed data, resulting in data reduction. Through its hierarchical and non-hierarchical approaches, it replaces sample variables with representative synthetic variables. This dimensional reduction can be extended to individuals by applying Ward's method. Methods: The cleaning process of anthropometric, obstetric, vital signs and pubertal status data from 700 participants of the Generation XXI cohort and/or their mothers led to variables being removed (181 down to 105 variables, 82 quantitative and 23 qualitative). Then, the hierarchical technique of the ClustOfVar package was applied, which started by building a hierarchy of variables. The optimal number of clusters was then determined, considering the aggregation level plot and the bootstrap methodology, and each cluster was characterized. The partition into clusters was then tried with the non-hierarchical process. Once the partition was defined, Ward's method was applied, dividing the participants into clusters. We finished with their description according to the synthetic variables. Results: The partition in 8 clusters of variables suggested by the hierarchical technique was chosen, with the first and third cluster being filled mainly by maternal characteristics (relating mainly to menstruation and physical measurements, respectively). While cluster 2 mixes maternal and individual characteristics, cluster 4 contains only patient variables at birth. Cluster 5 is the most diverse, with anthropometric and related measurements of vital signs and blood macromolecules. Cluster 6 has total mass and fat measurements. Finally, cluster 7 is related to pubertal status variables, and cluster 8 includes cholesterol variables. The clustering of individuals results in the creation of specific profiles for each of the 8 clusters of individuals. Conclusions: The ClustOfVar technique accomplishes a data transformation relevant to the dispersion of personalized medicine. However, it lacks the ability to deal with high proportions of missing data and its bootstrap process is very time-consuming.University of Aveiro2023-06-14T07:55:10Z2023-05-31T00:00:00Z2023-05-31info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/38016eng2184-579410.34624/jshd.v5i2.31495Oliveira, NatachaSevero, Miltoninfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:14:14Zoai:ria.ua.pt:10773/38016Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:08:34.898724Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database |
title |
The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database |
spellingShingle |
The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database Oliveira, Natacha ClustOfVar Clustering Dimension reduction Generation XXI cohort Mixed data |
title_short |
The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database |
title_full |
The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database |
title_fullStr |
The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database |
title_full_unstemmed |
The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database |
title_sort |
The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database |
author |
Oliveira, Natacha |
author_facet |
Oliveira, Natacha Severo, Milton |
author_role |
author |
author2 |
Severo, Milton |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Oliveira, Natacha Severo, Milton |
dc.subject.por.fl_str_mv |
ClustOfVar Clustering Dimension reduction Generation XXI cohort Mixed data |
topic |
ClustOfVar Clustering Dimension reduction Generation XXI cohort Mixed data |
description |
ackground/Objective: Technological evolution is increasingly making real the elements necessary for the daily practice of personalized medicine, an improved vision of health care whose decisions regarding prognosis, diagnosis and therapeutic strategies depend on the patient's various characteristics. This approach leads to the collection and use of information that is broad in extension and complexity, for which dimensionality reduction techniques are imperative, in order to simplify and understand it. This paper aims to show the value of the ClustOfVar technique, a variable clustering technique capable of dealing with mixed data, resulting in data reduction. Through its hierarchical and non-hierarchical approaches, it replaces sample variables with representative synthetic variables. This dimensional reduction can be extended to individuals by applying Ward's method. Methods: The cleaning process of anthropometric, obstetric, vital signs and pubertal status data from 700 participants of the Generation XXI cohort and/or their mothers led to variables being removed (181 down to 105 variables, 82 quantitative and 23 qualitative). Then, the hierarchical technique of the ClustOfVar package was applied, which started by building a hierarchy of variables. The optimal number of clusters was then determined, considering the aggregation level plot and the bootstrap methodology, and each cluster was characterized. The partition into clusters was then tried with the non-hierarchical process. Once the partition was defined, Ward's method was applied, dividing the participants into clusters. We finished with their description according to the synthetic variables. Results: The partition in 8 clusters of variables suggested by the hierarchical technique was chosen, with the first and third cluster being filled mainly by maternal characteristics (relating mainly to menstruation and physical measurements, respectively). While cluster 2 mixes maternal and individual characteristics, cluster 4 contains only patient variables at birth. Cluster 5 is the most diverse, with anthropometric and related measurements of vital signs and blood macromolecules. Cluster 6 has total mass and fat measurements. Finally, cluster 7 is related to pubertal status variables, and cluster 8 includes cholesterol variables. The clustering of individuals results in the creation of specific profiles for each of the 8 clusters of individuals. Conclusions: The ClustOfVar technique accomplishes a data transformation relevant to the dispersion of personalized medicine. However, it lacks the ability to deal with high proportions of missing data and its bootstrap process is very time-consuming. |
publishDate |
2023 |
dc.date.none.fl_str_mv |
2023-06-14T07:55:10Z 2023-05-31T00:00:00Z 2023-05-31 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/38016 |
url |
http://hdl.handle.net/10773/38016 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
2184-5794 10.34624/jshd.v5i2.31495 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
University of Aveiro |
publisher.none.fl_str_mv |
University of Aveiro |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137737790455808 |