DI2

Detalhes bibliográficos
Autor(a) principal: Alexandre, Leonardo
Data de Publicação: 2021
Outros Autores: Costa, Rafael S., Henriques, Rui
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/128950
Resumo: CEECIND/01399/2017
id RCAP_53dd22e91d84b7b881a1fb08702de0e1
oai_identifier_str oai:run.unl.pt:10362/128950
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling DI2prior-free and multi-item discretization of biological data and its applicationsData miningHeterogeneous biological dataMulti-item discretizationPrior-free discretizationStructural BiologyBiochemistryMolecular BiologyComputer Science ApplicationsApplied MathematicsCEECIND/01399/2017Background: A considerable number of data mining approaches for biomedical data analysis, including state-of-the-art associative models, require a form of data discretization. Although diverse discretization approaches have been proposed, they generally work under a strict set of statistical assumptions which are arguably insufficient to handle the diversity and heterogeneity of clinical and molecular variables within a given dataset. In addition, although an increasing number of symbolic approaches in bioinformatics are able to assign multiple items to values occurring near discretization boundaries for superior robustness, there are no reference principles on how to perform multi-item discretizations. Results: In this study, an unsupervised discretization method, DI2, for variables with arbitrarily skewed distributions is proposed. Statistical tests applied to assess differences in performance confirm that DI2 generally outperforms well-established discretizations methods with statistical significance. Within classification tasks, DI2 displays either competitive or superior levels of predictive accuracy, particularly delineate for classifiers able to accommodate border values. Conclusions: This work proposes a new unsupervised method for data discretization, DI2, that takes into account the underlying data regularities, the presence of outlier values disrupting expected regularities, as well as the relevance of border values. DI2 is available at https://github.com/JupitersMight/DI2LAQV@REQUIMTEDQ - Departamento de QuímicaRUNAlexandre, LeonardoCosta, Rafael S.Henriques, Rui2021-12-09T23:39:53Z2021-122021-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10362/128950eng1471-2105PURE: 34773354https://doi.org/10.1186/s12859-021-04329-8info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:08:15Zoai:run.unl.pt:10362/128950Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:46:25.817933Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv DI2
prior-free and multi-item discretization of biological data and its applications
title DI2
spellingShingle DI2
Alexandre, Leonardo
Data mining
Heterogeneous biological data
Multi-item discretization
Prior-free discretization
Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics
title_short DI2
title_full DI2
title_fullStr DI2
title_full_unstemmed DI2
title_sort DI2
author Alexandre, Leonardo
author_facet Alexandre, Leonardo
Costa, Rafael S.
Henriques, Rui
author_role author
author2 Costa, Rafael S.
Henriques, Rui
author2_role author
author
dc.contributor.none.fl_str_mv LAQV@REQUIMTE
DQ - Departamento de Química
RUN
dc.contributor.author.fl_str_mv Alexandre, Leonardo
Costa, Rafael S.
Henriques, Rui
dc.subject.por.fl_str_mv Data mining
Heterogeneous biological data
Multi-item discretization
Prior-free discretization
Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics
topic Data mining
Heterogeneous biological data
Multi-item discretization
Prior-free discretization
Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics
description CEECIND/01399/2017
publishDate 2021
dc.date.none.fl_str_mv 2021-12-09T23:39:53Z
2021-12
2021-12-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/128950
url http://hdl.handle.net/10362/128950
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1471-2105
PURE: 34773354
https://doi.org/10.1186/s12859-021-04329-8
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138067781517312