Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers

Detalhes bibliográficos
Autor(a) principal: Zanotto, Bruna
Data de Publicação: 2021
Outros Autores: Etges, Ana, dal Bosco, Avner, Cortes, Eduardo, Ruschenll, Renata, Souza, Ana, Andrade, Claudio, Viegas, Felipe, Canuto, Sergio, Luiz, Washington, Martins, Sheila, Vieira, Renata, Polanczyk, Carisi, Gonçalves, Marcos
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10174/30381
https://doi.org/10.2196/29120
Resumo: Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: The research reported in this article aims at comparing the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning (ML) methods, including state-of-the-art neural and non-neural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1-score), supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results. Results: The top-performing models were support vector machines (SVM) trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR’s textual representations. The SVM models produced statistically superior results in a total of 17 tasks out of 24 (70%), with an F1 score > 80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional non-neural methods given the characteristics of the dataset. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims’ clinical conditions, and thus, ultimately assess the possibility of proactively using these machine-learning techniques in real-world situations.
id RCAP_95dad2c7a3269c789f99f883dd7e0784
oai_identifier_str oai:dspace.uevora.pt:10174/30381
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural ClassifiersElectronic Health Recordstext classificationBackground: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: The research reported in this article aims at comparing the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning (ML) methods, including state-of-the-art neural and non-neural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1-score), supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results. Results: The top-performing models were support vector machines (SVM) trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR’s textual representations. The SVM models produced statistically superior results in a total of 17 tasks out of 24 (70%), with an F1 score > 80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional non-neural methods given the characteristics of the dataset. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims’ clinical conditions, and thus, ultimately assess the possibility of proactively using these machine-learning techniques in real-world situations.2021-12-03T11:07:52Z2021-12-032021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/30381http://hdl.handle.net/10174/30381https://doi.org/10.2196/29120engZanotto BS, Beck da Silva Etges AP, dal Bosco A, Cortes EG, Ruschel R, De Souza AC, Andrade CMV, Viegas F, Canuto S, Luiz W, Ouriques Martins S, Vieira R, Polanczyk C, André Gonçalves M Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers JMIR Med Inform 2021;9(11):e29120 doi: 10.2196/29120https://medinform.jmir.org/2021/11/e29120ndndndndndndndndndndndrenatav@uevora.ptndndZanotto, BrunaEtges, Anadal Bosco, AvnerCortes, EduardoRuschenll, RenataSouza, AnaAndrade, ClaudioViegas, FelipeCanuto, SergioLuiz, WashingtonMartins, SheilaVieira, RenataPolanczyk, CarisiGonçalves, Marcosinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:27:53Zoai:dspace.uevora.pt:10174/30381Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:19:39.445485Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
spellingShingle Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
Zanotto, Bruna
Electronic Health Records
text classification
title_short Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title_full Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title_fullStr Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title_full_unstemmed Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
title_sort Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
author Zanotto, Bruna
author_facet Zanotto, Bruna
Etges, Ana
dal Bosco, Avner
Cortes, Eduardo
Ruschenll, Renata
Souza, Ana
Andrade, Claudio
Viegas, Felipe
Canuto, Sergio
Luiz, Washington
Martins, Sheila
Vieira, Renata
Polanczyk, Carisi
Gonçalves, Marcos
author_role author
author2 Etges, Ana
dal Bosco, Avner
Cortes, Eduardo
Ruschenll, Renata
Souza, Ana
Andrade, Claudio
Viegas, Felipe
Canuto, Sergio
Luiz, Washington
Martins, Sheila
Vieira, Renata
Polanczyk, Carisi
Gonçalves, Marcos
author2_role author
author
author
author
author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Zanotto, Bruna
Etges, Ana
dal Bosco, Avner
Cortes, Eduardo
Ruschenll, Renata
Souza, Ana
Andrade, Claudio
Viegas, Felipe
Canuto, Sergio
Luiz, Washington
Martins, Sheila
Vieira, Renata
Polanczyk, Carisi
Gonçalves, Marcos
dc.subject.por.fl_str_mv Electronic Health Records
text classification
topic Electronic Health Records
text classification
description Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: The research reported in this article aims at comparing the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning (ML) methods, including state-of-the-art neural and non-neural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1-score), supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results. Results: The top-performing models were support vector machines (SVM) trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR’s textual representations. The SVM models produced statistically superior results in a total of 17 tasks out of 24 (70%), with an F1 score > 80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional non-neural methods given the characteristics of the dataset. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims’ clinical conditions, and thus, ultimately assess the possibility of proactively using these machine-learning techniques in real-world situations.
publishDate 2021
dc.date.none.fl_str_mv 2021-12-03T11:07:52Z
2021-12-03
2021-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10174/30381
http://hdl.handle.net/10174/30381
https://doi.org/10.2196/29120
url http://hdl.handle.net/10174/30381
https://doi.org/10.2196/29120
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Zanotto BS, Beck da Silva Etges AP, dal Bosco A, Cortes EG, Ruschel R, De Souza AC, Andrade CMV, Viegas F, Canuto S, Luiz W, Ouriques Martins S, Vieira R, Polanczyk C, André Gonçalves M Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers JMIR Med Inform 2021;9(11):e29120 doi: 10.2196/29120
https://medinform.jmir.org/2021/11/e29120
nd
nd
nd
nd
nd
nd
nd
nd
nd
nd
nd
renatav@uevora.pt
nd
nd
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136677990498304