The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database

Oliveira, Natacha; Severo, Milton

The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database

Detalhes bibliográficos
Autor(a) principal:	Oliveira, Natacha
Data de Publicação:	2023
Outros Autores:	Severo, Milton
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://doi.org/10.34624/jshd.v5i2.31495
Resumo:	Background/Objective: Technological evolution is increasingly making real the elements necessary for the daily practice of personalized medicine, an improved vision of health care whose decisions regarding prognosis, diagnosis and therapeutic strategies depend on the patient's various characteristics. This approach leads to the collection and use of information that is broad in extension and complexity, for which dimensionality reduction techniques are imperative, in order to simplify and understand it. This paper aims to show the value of the ClustOfVar technique, a variable clustering technique capable of dealing with mixed data, resulting in data reduction. Through its hierarchical and non-hierarchical approaches, it replaces sample variables with representative synthetic variables. This dimensional reduction can be extended to individuals by applying Ward's method. Methods: The cleaning process of anthropometric, obstetric, vital signs and pubertal status data from 700 participants of the Generation XXI cohort and/or their mothers led to variables being removed (181 down to 105 variables, 82 quantitative and 23 qualitative). Then, the hierarchical technique of the ClustOfVar package was applied, which started by building a hierarchy of variables. The optimal number of clusters was then determined, considering the aggregation level plot and the bootstrap methodology, and each cluster was characterized. The partition into clusters was then tried with the non-hierarchical process. Once the partition was defined, Ward's method was applied, dividing the participants into clusters. We finished with their description according to the synthetic variables. Results: The partition in 8 clusters of variables suggested by the hierarchical technique was chosen, with the first and third cluster being filled mainly by maternal characteristics (relating mainly to menstruation and physical measurements, respectively). While cluster 2 mixes maternal and individual characteristics, cluster 4 contains only patient variables at birth. Cluster 5 is the most diverse, with anthropometric and related measurements of vital signs and blood macromolecules. Cluster 6 has total mass and fat measurements. Finally, cluster 7 is related to pubertal status variables, and cluster 8 includes cholesterol variables. The clustering of individuals results in the creation of specific profiles for each of the 8 clusters of individuals. Conclusions: The ClustOfVar technique accomplishes a data transformation relevant to the dispersion of personalized medicine. However, it lacks the ability to deal with high proportions of missing data and its bootstrap process is very time-consuming.

Metadados do item

id	RCAP_e881f15e8e4d22152b02289f275337a6
oai_identifier_str	oai:proa.ua.pt:article/31495
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health databaseBackground/Objective: Technological evolution is increasingly making real the elements necessary for the daily practice of personalized medicine, an improved vision of health care whose decisions regarding prognosis, diagnosis and therapeutic strategies depend on the patient's various characteristics. This approach leads to the collection and use of information that is broad in extension and complexity, for which dimensionality reduction techniques are imperative, in order to simplify and understand it. This paper aims to show the value of the ClustOfVar technique, a variable clustering technique capable of dealing with mixed data, resulting in data reduction. Through its hierarchical and non-hierarchical approaches, it replaces sample variables with representative synthetic variables. This dimensional reduction can be extended to individuals by applying Ward's method. Methods: The cleaning process of anthropometric, obstetric, vital signs and pubertal status data from 700 participants of the Generation XXI cohort and/or their mothers led to variables being removed (181 down to 105 variables, 82 quantitative and 23 qualitative). Then, the hierarchical technique of the ClustOfVar package was applied, which started by building a hierarchy of variables. The optimal number of clusters was then determined, considering the aggregation level plot and the bootstrap methodology, and each cluster was characterized. The partition into clusters was then tried with the non-hierarchical process. Once the partition was defined, Ward's method was applied, dividing the participants into clusters. We finished with their description according to the synthetic variables. Results: The partition in 8 clusters of variables suggested by the hierarchical technique was chosen, with the first and third cluster being filled mainly by maternal characteristics (relating mainly to menstruation and physical measurements, respectively). While cluster 2 mixes maternal and individual characteristics, cluster 4 contains only patient variables at birth. Cluster 5 is the most diverse, with anthropometric and related measurements of vital signs and blood macromolecules. Cluster 6 has total mass and fat measurements. Finally, cluster 7 is related to pubertal status variables, and cluster 8 includes cholesterol variables. The clustering of individuals results in the creation of specific profiles for each of the 8 clusters of individuals. Conclusions: The ClustOfVar technique accomplishes a data transformation relevant to the dispersion of personalized medicine. However, it lacks the ability to deal with high proportions of missing data and its bootstrap process is very time-consuming.University of Aveiro (UA) and Hospital Center of Baixo Vouga (CHBV)2023-05-31info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.34624/jshd.v5i2.31495https://doi.org/10.34624/jshd.v5i2.31495Journal of Statistics on Health Decision; Vol 5 No 2 (2023): Special Issue - Statistics on Health Decision Making: Personalized Medicine; e31495Journal of Statistics on Health Decision; vol. 5 n.º 2 (2023): Special Issue - Statistics on Health Decision Making: Personalized Medicine; e314952184-5794reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPenghttps://proa.ua.pt/index.php/jshd/article/view/31495https://proa.ua.pt/index.php/jshd/article/view/31495/22199Copyright (c) 2023 Natacha Oliveira, Milton Severohttp://creativecommons.org/licenses/by-nc-nd/4.0info:eu-repo/semantics/openAccessOliveira, NatachaSevero, Milton2023-06-01T22:30:12Zoai:proa.ua.pt:article/31495Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:59:40.623561Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database
title	The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database
spellingShingle	The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database Oliveira, Natacha
title_short	The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database
title_full	The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database
title_fullStr	The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database
title_full_unstemmed	The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database
title_sort	The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database
author	Oliveira, Natacha
author_facet	Oliveira, Natacha Severo, Milton
author_role	author
author2	Severo, Milton
author2_role	author
dc.contributor.author.fl_str_mv	Oliveira, Natacha Severo, Milton
description	Background/Objective: Technological evolution is increasingly making real the elements necessary for the daily practice of personalized medicine, an improved vision of health care whose decisions regarding prognosis, diagnosis and therapeutic strategies depend on the patient's various characteristics. This approach leads to the collection and use of information that is broad in extension and complexity, for which dimensionality reduction techniques are imperative, in order to simplify and understand it. This paper aims to show the value of the ClustOfVar technique, a variable clustering technique capable of dealing with mixed data, resulting in data reduction. Through its hierarchical and non-hierarchical approaches, it replaces sample variables with representative synthetic variables. This dimensional reduction can be extended to individuals by applying Ward's method. Methods: The cleaning process of anthropometric, obstetric, vital signs and pubertal status data from 700 participants of the Generation XXI cohort and/or their mothers led to variables being removed (181 down to 105 variables, 82 quantitative and 23 qualitative). Then, the hierarchical technique of the ClustOfVar package was applied, which started by building a hierarchy of variables. The optimal number of clusters was then determined, considering the aggregation level plot and the bootstrap methodology, and each cluster was characterized. The partition into clusters was then tried with the non-hierarchical process. Once the partition was defined, Ward's method was applied, dividing the participants into clusters. We finished with their description according to the synthetic variables. Results: The partition in 8 clusters of variables suggested by the hierarchical technique was chosen, with the first and third cluster being filled mainly by maternal characteristics (relating mainly to menstruation and physical measurements, respectively). While cluster 2 mixes maternal and individual characteristics, cluster 4 contains only patient variables at birth. Cluster 5 is the most diverse, with anthropometric and related measurements of vital signs and blood macromolecules. Cluster 6 has total mass and fat measurements. Finally, cluster 7 is related to pubertal status variables, and cluster 8 includes cholesterol variables. The clustering of individuals results in the creation of specific profiles for each of the 8 clusters of individuals. Conclusions: The ClustOfVar technique accomplishes a data transformation relevant to the dispersion of personalized medicine. However, it lacks the ability to deal with high proportions of missing data and its bootstrap process is very time-consuming.
publishDate	2023
dc.date.none.fl_str_mv	2023-05-31
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://doi.org/10.34624/jshd.v5i2.31495 https://doi.org/10.34624/jshd.v5i2.31495
url	https://doi.org/10.34624/jshd.v5i2.31495
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	https://proa.ua.pt/index.php/jshd/article/view/31495 https://proa.ua.pt/index.php/jshd/article/view/31495/22199
dc.rights.driver.fl_str_mv	Copyright (c) 2023 Natacha Oliveira, Milton Severo http://creativecommons.org/licenses/by-nc-nd/4.0 info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Copyright (c) 2023 Natacha Oliveira, Milton Severo http://creativecommons.org/licenses/by-nc-nd/4.0
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	University of Aveiro (UA) and Hospital Center of Baixo Vouga (CHBV)
publisher.none.fl_str_mv	University of Aveiro (UA) and Hospital Center of Baixo Vouga (CHBV)
dc.source.none.fl_str_mv	Journal of Statistics on Health Decision; Vol 5 No 2 (2023): Special Issue - Statistics on Health Decision Making: Personalized Medicine; e31495 Journal of Statistics on Health Decision; vol. 5 n.º 2 (2023): Special Issue - Statistics on Health Decision Making: Personalized Medicine; e31495 2184-5794 reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799131666031050752

The dimension reduction power of ClustOfVar: application of the variable cluster analysis technique in a mixed data health database

Registros relacionados