Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers

Zanotto, Bruna; Etges, Ana; dal Bosco, Avner; Cortes, Eduardo; Ruschenll, Renata; Souza, Ana; Andrade, Claudio; Viegas, Felipe; Canuto, Sergio; Luiz, Washington; Martins, Sheila; Vieira, Renata; Polanczyk, Carisi; Gonçalves, Marcos

Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers

Detalhes bibliográficos
Autor(a) principal:	Zanotto, Bruna
Data de Publicação:	2021
Outros Autores:	Etges, Ana, dal Bosco, Avner, Cortes, Eduardo, Ruschenll, Renata, Souza, Ana, Andrade, Claudio, Viegas, Felipe, Canuto, Sergio, Luiz, Washington, Martins, Sheila, Vieira, Renata, Polanczyk, Carisi, Gonçalves, Marcos
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10174/30381 https://doi.org/10.2196/29120
Resumo:	Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: The research reported in this article aims at comparing the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning (ML) methods, including state-of-the-art neural and non-neural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1-score), supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results. Results: The top-performing models were support vector machines (SVM) trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR’s textual representations. The SVM models produced statistically superior results in a total of 17 tasks out of 24 (70%), with an F1 score > 80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional non-neural methods given the characteristics of the dataset. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims’ clinical conditions, and thus, ultimately assess the possibility of proactively using these machine-learning techniques in real-world situations.

Metadados do item

id	RCAP_95dad2c7a3269c789f99f883dd7e0784
oai_identifier_str	oai:dspace.uevora.pt:10174/30381
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural ClassifiersElectronic Health Recordstext classificationBackground: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: The research reported in this article aims at comparing the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning (ML) methods, including state-of-the-art neural and non-neural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1-score), supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results. Results: The top-performing models were support vector machines (SVM) trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR’s textual representations. The SVM models produced statistically superior results in a total of 17 tasks out of 24 (70%), with an F1 score > 80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional non-neural methods given the characteristics of the dataset. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims’ clinical conditions, and thus, ultimately assess the possibility of proactively using these machine-learning techniques in real-world situations.2021-12-03T11:07:52Z2021-12-032021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/30381http://hdl.handle.net/10174/30381https://doi.org/10.2196/29120engZanotto BS, Beck da Silva Etges AP, dal Bosco A, Cortes EG, Ruschel R, De Souza AC, Andrade CMV, Viegas F, Canuto S, Luiz W, Ouriques Martins S, Vieira R, Polanczyk C, André Gonçalves M Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers JMIR Med Inform 2021;9(11):e29120 doi: 10.2196/29120https://medinform.jmir.org/2021/11/e29120ndndndndndndndndndndndrenatav@uevora.ptndndZanotto, BrunaEtges, Anadal Bosco, AvnerCortes, EduardoRuschenll, RenataSouza, AnaAndrade, ClaudioViegas, FelipeCanuto, SergioLuiz, WashingtonMartins, SheilaVieira, RenataPolanczyk, CarisiGonçalves, Marcosinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:27:53Zoai:dspace.uevora.pt:10174/30381Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:19:39.445485Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title	Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
spellingShingle	Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers Zanotto, Bruna Electronic Health Records text classification
title_short	Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title_full	Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title_fullStr	Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title_full_unstemmed	Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title_sort	Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
author	Zanotto, Bruna
author_facet	Zanotto, Bruna Etges, Ana dal Bosco, Avner Cortes, Eduardo Ruschenll, Renata Souza, Ana Andrade, Claudio Viegas, Felipe Canuto, Sergio Luiz, Washington Martins, Sheila Vieira, Renata Polanczyk, Carisi Gonçalves, Marcos
author_role	author
author2	Etges, Ana dal Bosco, Avner Cortes, Eduardo Ruschenll, Renata Souza, Ana Andrade, Claudio Viegas, Felipe Canuto, Sergio Luiz, Washington Martins, Sheila Vieira, Renata Polanczyk, Carisi Gonçalves, Marcos
author2_role	author author author author author author author author author author author author author
dc.contributor.author.fl_str_mv	Zanotto, Bruna Etges, Ana dal Bosco, Avner Cortes, Eduardo Ruschenll, Renata Souza, Ana Andrade, Claudio Viegas, Felipe Canuto, Sergio Luiz, Washington Martins, Sheila Vieira, Renata Polanczyk, Carisi Gonçalves, Marcos
dc.subject.por.fl_str_mv	Electronic Health Records text classification
topic	Electronic Health Records text classification
description	Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: The research reported in this article aims at comparing the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning (ML) methods, including state-of-the-art neural and non-neural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1-score), supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results. Results: The top-performing models were support vector machines (SVM) trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR’s textual representations. The SVM models produced statistically superior results in a total of 17 tasks out of 24 (70%), with an F1 score > 80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional non-neural methods given the characteristics of the dataset. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims’ clinical conditions, and thus, ultimately assess the possibility of proactively using these machine-learning techniques in real-world situations.
publishDate	2021
dc.date.none.fl_str_mv	2021-12-03T11:07:52Z 2021-12-03 2021-01-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10174/30381 http://hdl.handle.net/10174/30381 https://doi.org/10.2196/29120
url	http://hdl.handle.net/10174/30381 https://doi.org/10.2196/29120
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Zanotto BS, Beck da Silva Etges AP, dal Bosco A, Cortes EG, Ruschel R, De Souza AC, Andrade CMV, Viegas F, Canuto S, Luiz W, Ouriques Martins S, Vieira R, Polanczyk C, André Gonçalves M Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers JMIR Med Inform 2021;9(11):e29120 doi: 10.2196/29120 https://medinform.jmir.org/2021/11/e29120 nd nd nd nd nd nd nd nd nd nd nd renatav@uevora.pt nd nd
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799136677990498304

Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers

Registros relacionados