Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes

Detalhes bibliográficos
Autor(a) principal: Sanchez, Jeniffer D.
Data de Publicação: 2023
Outros Autores: Rêgo, Leandro C., Ospina, Raydonal, Leiva, Víctor, Chesneau, Christophe, Castro, Cecília
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/1822/85507
Resumo: Predictive models based on empirical similarity are instrumental in biology and data science, where the premise is to measure the likeness of one observation with others in the same dataset. Biological datasets often encompass data that can be categorized. When using empirical similarity-based predictive models, two strategies for handling categorical covariates exist. The first strategy retains categorical covariates in their original form, applying distance measures and allocating weights to each covariate. In contrast, the second strategy creates binary variables, representing each variable level independently, and computes similarity measures solely through the Euclidean distance. This study performs a sensitivity analysis of these two strategies using computational simulations, and applies the results to a biological context. We use a linear regression model as a reference point, and consider two methods for estimating the model parameters, alongside exponential and fractional inverse similarity functions. The sensitivity is evaluated by determining the coefficient of variation of the parameter estimators across the three models as a measure of relative variability. Our results suggest that the first strategy excels over the second one in effectively dealing with categorical variables, and offers greater parsimony due to the use of fewer parameters.
id RCAP_a6a6e2128ea2e5694444f8aa0b31a729
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/85507
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributesBiological dataCoefficient of variationData scienceDistance measuresEstimation methodsPredictive modelingMonte Carlo simulationSimilarity functionsCiências Naturais::MatemáticasParcerias para a implementação dos objetivosPredictive models based on empirical similarity are instrumental in biology and data science, where the premise is to measure the likeness of one observation with others in the same dataset. Biological datasets often encompass data that can be categorized. When using empirical similarity-based predictive models, two strategies for handling categorical covariates exist. The first strategy retains categorical covariates in their original form, applying distance measures and allocating weights to each covariate. In contrast, the second strategy creates binary variables, representing each variable level independently, and computes similarity measures solely through the Euclidean distance. This study performs a sensitivity analysis of these two strategies using computational simulations, and applies the results to a biological context. We use a linear regression model as a reference point, and consider two methods for estimating the model parameters, alongside exponential and fractional inverse similarity functions. The sensitivity is evaluated by determining the coefficient of variation of the parameter estimators across the three models as a measure of relative variability. Our results suggest that the first strategy excels over the second one in effectively dealing with categorical variables, and offers greater parsimony due to the use of fewer parameters.ANCD -Agenția Națională pentru Cercetare și Dezvoltare(UIDB/00013/2020)MDPIUniversidade do MinhoSanchez, Jeniffer D.Rêgo, Leandro C.Ospina, RaydonalLeiva, VíctorChesneau, ChristopheCastro, Cecília2023-072023-07-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/85507eng2079-773710.3390/biology12070959https://www.mdpi.com/2079-7737/12/7/959info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-12T01:17:30Zoai:repositorium.sdum.uminho.pt:1822/85507Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:00:42.764442Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes
title Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes
spellingShingle Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes
Sanchez, Jeniffer D.
Biological data
Coefficient of variation
Data science
Distance measures
Estimation methods
Predictive modeling
Monte Carlo simulation
Similarity functions
Ciências Naturais::Matemáticas
Parcerias para a implementação dos objetivos
title_short Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes
title_full Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes
title_fullStr Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes
title_full_unstemmed Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes
title_sort Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes
author Sanchez, Jeniffer D.
author_facet Sanchez, Jeniffer D.
Rêgo, Leandro C.
Ospina, Raydonal
Leiva, Víctor
Chesneau, Christophe
Castro, Cecília
author_role author
author2 Rêgo, Leandro C.
Ospina, Raydonal
Leiva, Víctor
Chesneau, Christophe
Castro, Cecília
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Sanchez, Jeniffer D.
Rêgo, Leandro C.
Ospina, Raydonal
Leiva, Víctor
Chesneau, Christophe
Castro, Cecília
dc.subject.por.fl_str_mv Biological data
Coefficient of variation
Data science
Distance measures
Estimation methods
Predictive modeling
Monte Carlo simulation
Similarity functions
Ciências Naturais::Matemáticas
Parcerias para a implementação dos objetivos
topic Biological data
Coefficient of variation
Data science
Distance measures
Estimation methods
Predictive modeling
Monte Carlo simulation
Similarity functions
Ciências Naturais::Matemáticas
Parcerias para a implementação dos objetivos
description Predictive models based on empirical similarity are instrumental in biology and data science, where the premise is to measure the likeness of one observation with others in the same dataset. Biological datasets often encompass data that can be categorized. When using empirical similarity-based predictive models, two strategies for handling categorical covariates exist. The first strategy retains categorical covariates in their original form, applying distance measures and allocating weights to each covariate. In contrast, the second strategy creates binary variables, representing each variable level independently, and computes similarity measures solely through the Euclidean distance. This study performs a sensitivity analysis of these two strategies using computational simulations, and applies the results to a biological context. We use a linear regression model as a reference point, and consider two methods for estimating the model parameters, alongside exponential and fractional inverse similarity functions. The sensitivity is evaluated by determining the coefficient of variation of the parameter estimators across the three models as a measure of relative variability. Our results suggest that the first strategy excels over the second one in effectively dealing with categorical variables, and offers greater parsimony due to the use of fewer parameters.
publishDate 2023
dc.date.none.fl_str_mv 2023-07
2023-07-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/85507
url https://hdl.handle.net/1822/85507
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2079-7737
10.3390/biology12070959
https://www.mdpi.com/2079-7737/12/7/959
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132403056246784