Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Outros Autores: | , , , , , , , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10174/30381 https://doi.org/10.2196/29120 |
Resumo: | Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: The research reported in this article aims at comparing the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning (ML) methods, including state-of-the-art neural and non-neural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1-score), supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results. Results: The top-performing models were support vector machines (SVM) trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR’s textual representations. The SVM models produced statistically superior results in a total of 17 tasks out of 24 (70%), with an F1 score > 80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional non-neural methods given the characteristics of the dataset. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims’ clinical conditions, and thus, ultimately assess the possibility of proactively using these machine-learning techniques in real-world situations. |
id |
RCAP_95dad2c7a3269c789f99f883dd7e0784 |
---|---|
oai_identifier_str |
oai:dspace.uevora.pt:10174/30381 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural ClassifiersElectronic Health Recordstext classificationBackground: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: The research reported in this article aims at comparing the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning (ML) methods, including state-of-the-art neural and non-neural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1-score), supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results. Results: The top-performing models were support vector machines (SVM) trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR’s textual representations. The SVM models produced statistically superior results in a total of 17 tasks out of 24 (70%), with an F1 score > 80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional non-neural methods given the characteristics of the dataset. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims’ clinical conditions, and thus, ultimately assess the possibility of proactively using these machine-learning techniques in real-world situations.2021-12-03T11:07:52Z2021-12-032021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10174/30381http://hdl.handle.net/10174/30381https://doi.org/10.2196/29120engZanotto BS, Beck da Silva Etges AP, dal Bosco A, Cortes EG, Ruschel R, De Souza AC, Andrade CMV, Viegas F, Canuto S, Luiz W, Ouriques Martins S, Vieira R, Polanczyk C, André Gonçalves M Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers JMIR Med Inform 2021;9(11):e29120 doi: 10.2196/29120https://medinform.jmir.org/2021/11/e29120ndndndndndndndndndndndrenatav@uevora.ptndndZanotto, BrunaEtges, Anadal Bosco, AvnerCortes, EduardoRuschenll, RenataSouza, AnaAndrade, ClaudioViegas, FelipeCanuto, SergioLuiz, WashingtonMartins, SheilaVieira, RenataPolanczyk, CarisiGonçalves, Marcosinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-01-03T19:27:53Zoai:dspace.uevora.pt:10174/30381Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T01:19:39.445485Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers |
title |
Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers |
spellingShingle |
Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers Zanotto, Bruna Electronic Health Records text classification |
title_short |
Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers |
title_full |
Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers |
title_fullStr |
Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers |
title_full_unstemmed |
Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers |
title_sort |
Stroke Outcome Measurements from Electronic Medical Records: On the Effectiveness of Neural and Nonneural Classifiers |
author |
Zanotto, Bruna |
author_facet |
Zanotto, Bruna Etges, Ana dal Bosco, Avner Cortes, Eduardo Ruschenll, Renata Souza, Ana Andrade, Claudio Viegas, Felipe Canuto, Sergio Luiz, Washington Martins, Sheila Vieira, Renata Polanczyk, Carisi Gonçalves, Marcos |
author_role |
author |
author2 |
Etges, Ana dal Bosco, Avner Cortes, Eduardo Ruschenll, Renata Souza, Ana Andrade, Claudio Viegas, Felipe Canuto, Sergio Luiz, Washington Martins, Sheila Vieira, Renata Polanczyk, Carisi Gonçalves, Marcos |
author2_role |
author author author author author author author author author author author author author |
dc.contributor.author.fl_str_mv |
Zanotto, Bruna Etges, Ana dal Bosco, Avner Cortes, Eduardo Ruschenll, Renata Souza, Ana Andrade, Claudio Viegas, Felipe Canuto, Sergio Luiz, Washington Martins, Sheila Vieira, Renata Polanczyk, Carisi Gonçalves, Marcos |
dc.subject.por.fl_str_mv |
Electronic Health Records text classification |
topic |
Electronic Health Records text classification |
description |
Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: The research reported in this article aims at comparing the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problem of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: Tier 1 (achieved healthcare status), Tier 2 (recovery process), care-related (clinical management and risk scores), and baseline characteristics. The analyzed dataset was retrospectively extracted from the EMRs of stroke patients from a private Brazilian hospital between 2018 and 2019. A total of 44.206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning (ML) methods, including state-of-the-art neural and non-neural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1-score), supported by statistical significance tests. Feature importance analysis was conducted to provide insights regarding the results. Results: The top-performing models were support vector machines (SVM) trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR’s textual representations. The SVM models produced statistically superior results in a total of 17 tasks out of 24 (70%), with an F1 score > 80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally/ambulate and communicate), healthcare status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional non-neural methods given the characteristics of the dataset. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to stroke victims’ clinical conditions, and thus, ultimately assess the possibility of proactively using these machine-learning techniques in real-world situations. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-12-03T11:07:52Z 2021-12-03 2021-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10174/30381 http://hdl.handle.net/10174/30381 https://doi.org/10.2196/29120 |
url |
http://hdl.handle.net/10174/30381 https://doi.org/10.2196/29120 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Zanotto BS, Beck da Silva Etges AP, dal Bosco A, Cortes EG, Ruschel R, De Souza AC, Andrade CMV, Viegas F, Canuto S, Luiz W, Ouriques Martins S, Vieira R, Polanczyk C, André Gonçalves M Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers JMIR Med Inform 2021;9(11):e29120 doi: 10.2196/29120 https://medinform.jmir.org/2021/11/e29120 nd nd nd nd nd nd nd nd nd nd nd renatav@uevora.pt nd nd |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136677990498304 |