Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach

da Costa, Nattane Luíza; de Sá Alves, Mariana [UNESP]; de Sá Rodrigues, Nayara [UNESP]; Bandeira, Celso Muller [UNESP]; Oliveira Alves, Mônica Ghislaine; Mendes, Maria Anita; Cesar Alves, Levy Anderson; Almeida, Janete Dias [UNESP]; Barbosa, Rommel

Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach

Detalhes bibliográficos
Autor(a) principal:	da Costa, Nattane Luíza
Data de Publicação:	2022
Outros Autores:	de Sá Alves, Mariana [UNESP], de Sá Rodrigues, Nayara [UNESP], Bandeira, Celso Muller [UNESP], Oliveira Alves, Mônica Ghislaine, Mendes, Maria Anita, Cesar Alves, Levy Anderson, Almeida, Janete Dias [UNESP], Barbosa, Rommel
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Institucional da UNESP
Texto Completo:	http://dx.doi.org/10.1016/j.compbiomed.2022.105296 http://hdl.handle.net/11449/234108
Resumo:	Data mining has proven to be a reliable method to analyze and discover useful knowledge about various diseases, including cancer research. In particular, data mining and machine learning algorithms to study oral squamous cell carcinoma (OSCC), the most common form of oral cancer, is a new area of research. This malignant neoplasm can be studied using saliva samples. Saliva is an important biofluid that must be used to verify potential biomarkers associated with oral cancer. In this study, first, we provide an overview of OSSC diagnoses based on machine learning and salivary metabolites. To our knowledge, this is the first study to apply advanced data mining techniques to diagnose OSCC. Then, we give new results of classification and feature selection algorithms used to identify potential salivary biomarkers of OSCC. To accomplish this task, we used the filter feature selection random forest importance algorithm and a wrapper methodology to evaluate the importance of metabolites obtained from gas chromatography mass-spectrometry (GC-MS) in the context of differentiation of OSCC and the control group. Salivary samples (n = 68) were collected for the control group, and the OSCC group were from patients matched for gender, age, and smoking habit. The classification process occurred based on Random Forest (RF) classification algorithm along with 10-cross validation. The results showed that glucuronic acid, maleic acid, and batyl alcohol can classify the samples with an area under the curve (AUC) of 0.91 versus an AUC of 0.76 using all 51 metabolites analyzed. The methodology used in this study can assist healthcare professionals and be adopted to discover diagnostic biomarkers for other diseases.

Metadados do item

id	UNSP_57c2ceb4862dd2eba3d4e6e9b7b9872d
oai_identifier_str	oai:repositorio.unesp.br:11449/234108
network_acronym_str	UNSP
network_name_str	Repositório Institucional da UNESP
repository_id_str	2946
spelling	Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approachData miningFeature selectionMachine learningMetabolitesOral squamous cell carcinomaSalivary biomarkersData mining has proven to be a reliable method to analyze and discover useful knowledge about various diseases, including cancer research. In particular, data mining and machine learning algorithms to study oral squamous cell carcinoma (OSCC), the most common form of oral cancer, is a new area of research. This malignant neoplasm can be studied using saliva samples. Saliva is an important biofluid that must be used to verify potential biomarkers associated with oral cancer. In this study, first, we provide an overview of OSSC diagnoses based on machine learning and salivary metabolites. To our knowledge, this is the first study to apply advanced data mining techniques to diagnose OSCC. Then, we give new results of classification and feature selection algorithms used to identify potential salivary biomarkers of OSCC. To accomplish this task, we used the filter feature selection random forest importance algorithm and a wrapper methodology to evaluate the importance of metabolites obtained from gas chromatography mass-spectrometry (GC-MS) in the context of differentiation of OSCC and the control group. Salivary samples (n = 68) were collected for the control group, and the OSCC group were from patients matched for gender, age, and smoking habit. The classification process occurred based on Random Forest (RF) classification algorithm along with 10-cross validation. The results showed that glucuronic acid, maleic acid, and batyl alcohol can classify the samples with an area under the curve (AUC) of 0.91 versus an AUC of 0.76 using all 51 metabolites analyzed. The methodology used in this study can assist healthcare professionals and be adopted to discover diagnostic biomarkers for other diseases.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Informatics Nucleo Goiano Federal Institute of Education Science and Technology, Campus UrutaíDepartment of Biosciences and Oral Diagnosis Institute of Science and Technology São Paulo State University (Unesp)Technology Reaearch Center (NPT) Universidade Mogi das CruzesSchool of Medicine Anhembi Morumbi UniversityDempster MS Lab Universidade de São PauloSchool of Dentistry Universidade PaulistaSchool of Dentistry Universidade Municipal de São Caetano do SulInstituto de Informática Universidade Federal de GoiásDepartment of Biosciences and Oral Diagnosis Institute of Science and Technology São Paulo State University (Unesp)FAPESP: 2016/08633-0Science and TechnologyUniversidade Estadual Paulista (UNESP)Universidade Mogi das CruzesAnhembi Morumbi UniversityUniversidade de São Paulo (USP)Universidade PaulistaUniversidade Municipal de São Caetano do SulUniversidade Federal de Goiás (UFG)da Costa, Nattane Luízade Sá Alves, Mariana [UNESP]de Sá Rodrigues, Nayara [UNESP]Bandeira, Celso Muller [UNESP]Oliveira Alves, Mônica GhislaineMendes, Maria AnitaCesar Alves, Levy AndersonAlmeida, Janete Dias [UNESP]Barbosa, Rommel2022-05-01T13:41:29Z2022-05-01T13:41:29Z2022-04-01info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1016/j.compbiomed.2022.105296Computers in Biology and Medicine, v. 143.1879-05340010-4825http://hdl.handle.net/11449/23410810.1016/j.compbiomed.2022.1052962-s2.0-85124169435Scopusreponame:Repositório Institucional da UNESPinstname:Universidade Estadual Paulista (UNESP)instacron:UNESPengComputers in Biology and Medicineinfo:eu-repo/semantics/openAccess2022-05-01T13:41:29Zoai:repositorio.unesp.br:11449/234108Repositório InstitucionalPUBhttp://repositorio.unesp.br/oai/requestopendoar:29462024-08-05T16:14:31.258612Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv	Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach
title	Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach
spellingShingle	Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach da Costa, Nattane Luíza Data mining Feature selection Machine learning Metabolites Oral squamous cell carcinoma Salivary biomarkers
title_short	Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach
title_full	Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach
title_fullStr	Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach
title_full_unstemmed	Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach
title_sort	Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach
author	da Costa, Nattane Luíza
author_facet	da Costa, Nattane Luíza de Sá Alves, Mariana [UNESP] de Sá Rodrigues, Nayara [UNESP] Bandeira, Celso Muller [UNESP] Oliveira Alves, Mônica Ghislaine Mendes, Maria Anita Cesar Alves, Levy Anderson Almeida, Janete Dias [UNESP] Barbosa, Rommel
author_role	author
author2	de Sá Alves, Mariana [UNESP] de Sá Rodrigues, Nayara [UNESP] Bandeira, Celso Muller [UNESP] Oliveira Alves, Mônica Ghislaine Mendes, Maria Anita Cesar Alves, Levy Anderson Almeida, Janete Dias [UNESP] Barbosa, Rommel
author2_role	author author author author author author author author
dc.contributor.none.fl_str_mv	Science and Technology Universidade Estadual Paulista (UNESP) Universidade Mogi das Cruzes Anhembi Morumbi University Universidade de São Paulo (USP) Universidade Paulista Universidade Municipal de São Caetano do Sul Universidade Federal de Goiás (UFG)
dc.contributor.author.fl_str_mv	da Costa, Nattane Luíza de Sá Alves, Mariana [UNESP] de Sá Rodrigues, Nayara [UNESP] Bandeira, Celso Muller [UNESP] Oliveira Alves, Mônica Ghislaine Mendes, Maria Anita Cesar Alves, Levy Anderson Almeida, Janete Dias [UNESP] Barbosa, Rommel
dc.subject.por.fl_str_mv	Data mining Feature selection Machine learning Metabolites Oral squamous cell carcinoma Salivary biomarkers
topic	Data mining Feature selection Machine learning Metabolites Oral squamous cell carcinoma Salivary biomarkers
description	Data mining has proven to be a reliable method to analyze and discover useful knowledge about various diseases, including cancer research. In particular, data mining and machine learning algorithms to study oral squamous cell carcinoma (OSCC), the most common form of oral cancer, is a new area of research. This malignant neoplasm can be studied using saliva samples. Saliva is an important biofluid that must be used to verify potential biomarkers associated with oral cancer. In this study, first, we provide an overview of OSSC diagnoses based on machine learning and salivary metabolites. To our knowledge, this is the first study to apply advanced data mining techniques to diagnose OSCC. Then, we give new results of classification and feature selection algorithms used to identify potential salivary biomarkers of OSCC. To accomplish this task, we used the filter feature selection random forest importance algorithm and a wrapper methodology to evaluate the importance of metabolites obtained from gas chromatography mass-spectrometry (GC-MS) in the context of differentiation of OSCC and the control group. Salivary samples (n = 68) were collected for the control group, and the OSCC group were from patients matched for gender, age, and smoking habit. The classification process occurred based on Random Forest (RF) classification algorithm along with 10-cross validation. The results showed that glucuronic acid, maleic acid, and batyl alcohol can classify the samples with an area under the curve (AUC) of 0.91 versus an AUC of 0.76 using all 51 metabolites analyzed. The methodology used in this study can assist healthcare professionals and be adopted to discover diagnostic biomarkers for other diseases.
publishDate	2022
dc.date.none.fl_str_mv	2022-05-01T13:41:29Z 2022-05-01T13:41:29Z 2022-04-01
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://dx.doi.org/10.1016/j.compbiomed.2022.105296 Computers in Biology and Medicine, v. 143. 1879-0534 0010-4825 http://hdl.handle.net/11449/234108 10.1016/j.compbiomed.2022.105296 2-s2.0-85124169435
url	http://dx.doi.org/10.1016/j.compbiomed.2022.105296 http://hdl.handle.net/11449/234108
identifier_str_mv	Computers in Biology and Medicine, v. 143. 1879-0534 0010-4825 10.1016/j.compbiomed.2022.105296 2-s2.0-85124169435
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Computers in Biology and Medicine
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.source.none.fl_str_mv	Scopus reponame:Repositório Institucional da UNESP instname:Universidade Estadual Paulista (UNESP) instacron:UNESP
instname_str	Universidade Estadual Paulista (UNESP)
instacron_str	UNESP
institution	UNESP
reponame_str	Repositório Institucional da UNESP
collection	Repositório Institucional da UNESP
repository.name.fl_str_mv	Repositório Institucional da UNESP - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv
_version_	1808128623112093696

Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma – A data mining approach

Registros relacionados