Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Research, Society and Development |
Texto Completo: | https://rsdjournal.org/index.php/rsd/article/view/29184 |
Resumo: | The recording of patients' data in electronic patient records (EPRs) by healthcare providers is usually performed in free text fields, allowing different ways of describing that type of information (e.g., abbreviation, terminology, etc.). In scenarios like that, retrieving data from such source (text) by using SQL (Structured Query Language) queries becomes an unfeasible issue. Based on this fact, we present in this paper a tool for extracting comprehensible and standardized patients' data from unstructured data which applies Text Mining and Natural Language Processing techniques. Our main goal is to carry out an automatic process of extracting, clearing and structuring data obtained from EPRs belonging to pregnant patients from the Januario Cicco maternity hospital located in Natal - Brazil. 3,000 EPRs written in Portuguese from 2016 e 2020 were used in our comparison analysis between data manually retrieved by health professionals (e.g., doctors and nurses) and data retrieved by our tool. Moreover, we applied the Kruskal-Wallis statistical test in order to statically evaluate the obtained results between manual and automatic processes. Finally, the statistical results have showed that there was no statistical difference between the retrieval processes. In this sense, the final results were considerably promising. |
id |
UNIFEI_6ee44362d093f9a07d97df85cc58270f |
---|---|
oai_identifier_str |
oai:ojs.pkp.sfu.ca:article/29184 |
network_acronym_str |
UNIFEI |
network_name_str |
Research, Society and Development |
repository_id_str |
|
spelling |
Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured dataAplicación de Minería de Texto y Procesamiento de Lenguaje Natural a Registros Médicos Electrónicos para extraer y transformar textos en datos estructuradosAplicação de Mineração de Texto e Processamento de Linguagem Natural a Prontuários Médicos Eletrônicos para extração e transformação de textos em dados estruturadosText MiningNatural Language ProcessingElectronic Medical RecordAnamnesis.Minería de TextoProcesamiento del Lenguaje NaturalHistoria Clínica ElectrónicaAnamneses.Mineração de TextoProcessamento de Linguagem NaturalProntuário EletrônicoAnamnese.The recording of patients' data in electronic patient records (EPRs) by healthcare providers is usually performed in free text fields, allowing different ways of describing that type of information (e.g., abbreviation, terminology, etc.). In scenarios like that, retrieving data from such source (text) by using SQL (Structured Query Language) queries becomes an unfeasible issue. Based on this fact, we present in this paper a tool for extracting comprehensible and standardized patients' data from unstructured data which applies Text Mining and Natural Language Processing techniques. Our main goal is to carry out an automatic process of extracting, clearing and structuring data obtained from EPRs belonging to pregnant patients from the Januario Cicco maternity hospital located in Natal - Brazil. 3,000 EPRs written in Portuguese from 2016 e 2020 were used in our comparison analysis between data manually retrieved by health professionals (e.g., doctors and nurses) and data retrieved by our tool. Moreover, we applied the Kruskal-Wallis statistical test in order to statically evaluate the obtained results between manual and automatic processes. Finally, the statistical results have showed that there was no statistical difference between the retrieval processes. In this sense, the final results were considerably promising.El registro de los datos de los pacientes en las historias clínicas electrónicas (HPE) por parte de los profesionales sanitarios suele realizarse en campos de texto libre, lo que permite diferentes formas de describir este tipo de información (p. ej., abreviatura, terminología, etc.). En escenarios como este, la recuperación de datos de dicha fuente (texto) mediante consultas SQL (Lenguaje de consulta estructurado) se convierte en un problema inviable. En base a este hecho, presentamos en este artículo una herramienta para extraer datos comprensibles y estandarizados de pacientes a partir de datos no estructurados que aplica técnicas de Minería de Texto y Procesamiento de Lenguaje Natural. Nuestro principal objetivo es realizar un proceso automático de extracción, limpieza y estructuración de datos obtenidos de PEP de gestantes en la maternidad Januário Cicco ubicada en Natal - Brasil. En nuestro análisis que compara los datos recuperados manualmente por profesionales de la salud (p. ej., médicos y enfermeras) y los datos recuperados por nuestra herramienta, se utilizaron 3000 EPR escritos en portugués. Además, aplicamos la prueba estadística de Kruskal-Wallis para evaluar estáticamente los resultados obtenidos entre procesos manuales y automáticos. Finalmente, los resultados estadísticos mostraron que no hubo diferencia estadística entre los procesos de recuperación. En este sentido, los resultados fueron considerablemente prometedores.O registro dos dados dos pacientes em prontuários eletrônicos (EPRs) pelos profissionais de saúde geralmente é realizado em campos de texto livre, permitindo diferentes formas de descrever esse tipo de informação (por exemplo, abreviatura, terminologia etc.). Em cenários como esse, recuperar dados de tal fonte (texto) usando consultas SQL (Structured Query Language) torna-se um problema inviável. Com base neste fato, apresentamos neste artigo uma ferramenta para extração de dados compreensíveis e padronizados de pacientes a partir de dados não estruturados que aplica técnicas de Mineração de Texto e Processamento de Linguagem Natural. Nosso principal objetivo é realizar um processo automático de extração, limpeza e estruturação de dados obtidos de PEPs de gestantes da maternidade Januário Cicco localizada em Natal - Brasil. Em nossa análise de comparação entre dados recuperados manualmente por profissionais de saúde (por exemplo, médicos e enfermeiros) e dados recuperados por nossa ferramenta foram usados 3.000 EPRs escritos em português. Além disso, aplicamos o teste estatístico de Kruskal-Wallis para avaliar estaticamente os resultados obtidos entre processos manuais e automáticos. Por fim, os resultados estatísticos mostraram que não houve diferença estatística entre os processos de recuperação. Nesse sentido, os resultados foram consideravelmente promissores.Research, Society and Development2022-04-30info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://rsdjournal.org/index.php/rsd/article/view/2918410.33448/rsd-v11i6.29184Research, Society and Development; Vol. 11 No. 6; e37711629184Research, Society and Development; Vol. 11 Núm. 6; e37711629184Research, Society and Development; v. 11 n. 6; e377116291842525-3409reponame:Research, Society and Developmentinstname:Universidade Federal de Itajubá (UNIFEI)instacron:UNIFEIenghttps://rsdjournal.org/index.php/rsd/article/view/29184/25309Copyright (c) 2022 Diego Henrique Pegado Benício; João Carlos Xavier Junior; Kairon Ramon Sabino de Paiva; Juliana Dantas de Araújo Santos Camargohttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessBenício, Diego Henrique Pegado Xavier Junior, João Carlos Paiva, Kairon Ramon Sabino de Camargo, Juliana Dantas de Araújo Santos 2022-05-13T18:04:10Zoai:ojs.pkp.sfu.ca:article/29184Revistahttps://rsdjournal.org/index.php/rsd/indexPUBhttps://rsdjournal.org/index.php/rsd/oairsd.articles@gmail.com2525-34092525-3409opendoar:2024-01-17T09:46:20.038081Research, Society and Development - Universidade Federal de Itajubá (UNIFEI)false |
dc.title.none.fl_str_mv |
Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data Aplicación de Minería de Texto y Procesamiento de Lenguaje Natural a Registros Médicos Electrónicos para extraer y transformar textos en datos estructurados Aplicação de Mineração de Texto e Processamento de Linguagem Natural a Prontuários Médicos Eletrônicos para extração e transformação de textos em dados estruturados |
title |
Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data |
spellingShingle |
Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data Benício, Diego Henrique Pegado Text Mining Natural Language Processing Electronic Medical Record Anamnesis. Minería de Texto Procesamiento del Lenguaje Natural Historia Clínica Electrónica Anamneses. Mineração de Texto Processamento de Linguagem Natural Prontuário Eletrônico Anamnese. |
title_short |
Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data |
title_full |
Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data |
title_fullStr |
Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data |
title_full_unstemmed |
Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data |
title_sort |
Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data |
author |
Benício, Diego Henrique Pegado |
author_facet |
Benício, Diego Henrique Pegado Xavier Junior, João Carlos Paiva, Kairon Ramon Sabino de Camargo, Juliana Dantas de Araújo Santos |
author_role |
author |
author2 |
Xavier Junior, João Carlos Paiva, Kairon Ramon Sabino de Camargo, Juliana Dantas de Araújo Santos |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Benício, Diego Henrique Pegado Xavier Junior, João Carlos Paiva, Kairon Ramon Sabino de Camargo, Juliana Dantas de Araújo Santos |
dc.subject.por.fl_str_mv |
Text Mining Natural Language Processing Electronic Medical Record Anamnesis. Minería de Texto Procesamiento del Lenguaje Natural Historia Clínica Electrónica Anamneses. Mineração de Texto Processamento de Linguagem Natural Prontuário Eletrônico Anamnese. |
topic |
Text Mining Natural Language Processing Electronic Medical Record Anamnesis. Minería de Texto Procesamiento del Lenguaje Natural Historia Clínica Electrónica Anamneses. Mineração de Texto Processamento de Linguagem Natural Prontuário Eletrônico Anamnese. |
description |
The recording of patients' data in electronic patient records (EPRs) by healthcare providers is usually performed in free text fields, allowing different ways of describing that type of information (e.g., abbreviation, terminology, etc.). In scenarios like that, retrieving data from such source (text) by using SQL (Structured Query Language) queries becomes an unfeasible issue. Based on this fact, we present in this paper a tool for extracting comprehensible and standardized patients' data from unstructured data which applies Text Mining and Natural Language Processing techniques. Our main goal is to carry out an automatic process of extracting, clearing and structuring data obtained from EPRs belonging to pregnant patients from the Januario Cicco maternity hospital located in Natal - Brazil. 3,000 EPRs written in Portuguese from 2016 e 2020 were used in our comparison analysis between data manually retrieved by health professionals (e.g., doctors and nurses) and data retrieved by our tool. Moreover, we applied the Kruskal-Wallis statistical test in order to statically evaluate the obtained results between manual and automatic processes. Finally, the statistical results have showed that there was no statistical difference between the retrieval processes. In this sense, the final results were considerably promising. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-04-30 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://rsdjournal.org/index.php/rsd/article/view/29184 10.33448/rsd-v11i6.29184 |
url |
https://rsdjournal.org/index.php/rsd/article/view/29184 |
identifier_str_mv |
10.33448/rsd-v11i6.29184 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://rsdjournal.org/index.php/rsd/article/view/29184/25309 |
dc.rights.driver.fl_str_mv |
https://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Research, Society and Development |
publisher.none.fl_str_mv |
Research, Society and Development |
dc.source.none.fl_str_mv |
Research, Society and Development; Vol. 11 No. 6; e37711629184 Research, Society and Development; Vol. 11 Núm. 6; e37711629184 Research, Society and Development; v. 11 n. 6; e37711629184 2525-3409 reponame:Research, Society and Development instname:Universidade Federal de Itajubá (UNIFEI) instacron:UNIFEI |
instname_str |
Universidade Federal de Itajubá (UNIFEI) |
instacron_str |
UNIFEI |
institution |
UNIFEI |
reponame_str |
Research, Society and Development |
collection |
Research, Society and Development |
repository.name.fl_str_mv |
Research, Society and Development - Universidade Federal de Itajubá (UNIFEI) |
repository.mail.fl_str_mv |
rsd.articles@gmail.com |
_version_ |
1797052836039622656 |