Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data

Detalhes bibliográficos
Autor(a) principal: Benício, Diego Henrique Pegado
Data de Publicação: 2022
Outros Autores: Xavier Junior, João Carlos, Paiva, Kairon Ramon Sabino de, Camargo, Juliana Dantas de Araújo Santos
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Research, Society and Development
Texto Completo: https://rsdjournal.org/index.php/rsd/article/view/29184
Resumo: The recording of patients' data in electronic patient records (EPRs) by healthcare providers is usually performed in free text fields, allowing different ways of describing that type of information (e.g., abbreviation, terminology, etc.). In scenarios like that, retrieving data from such source (text) by using SQL (Structured Query Language) queries becomes an unfeasible issue. Based on this fact, we present in this paper a tool for extracting comprehensible and standardized patients' data from unstructured data which applies Text Mining and Natural Language Processing techniques. Our main goal is to carry out an automatic process of extracting, clearing and structuring data obtained from EPRs belonging to pregnant patients from the Januario Cicco maternity hospital located in Natal - Brazil. 3,000 EPRs written in Portuguese from 2016 e 2020 were used in our comparison analysis between data manually retrieved by health professionals (e.g., doctors and nurses) and data retrieved by our tool. Moreover, we applied the Kruskal-Wallis statistical test in order to statically evaluate the obtained results between manual and automatic processes. Finally, the statistical results have showed that there was no statistical difference between the retrieval processes. In this sense, the final results were considerably promising.
id UNIFEI_6ee44362d093f9a07d97df85cc58270f
oai_identifier_str oai:ojs.pkp.sfu.ca:article/29184
network_acronym_str UNIFEI
network_name_str Research, Society and Development
repository_id_str
spelling Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured dataAplicación de Minería de Texto y Procesamiento de Lenguaje Natural a Registros Médicos Electrónicos para extraer y transformar textos en datos estructuradosAplicação de Mineração de Texto e Processamento de Linguagem Natural a Prontuários Médicos Eletrônicos para extração e transformação de textos em dados estruturadosText MiningNatural Language ProcessingElectronic Medical RecordAnamnesis.Minería de TextoProcesamiento del Lenguaje NaturalHistoria Clínica ElectrónicaAnamneses.Mineração de TextoProcessamento de Linguagem NaturalProntuário EletrônicoAnamnese.The recording of patients' data in electronic patient records (EPRs) by healthcare providers is usually performed in free text fields, allowing different ways of describing that type of information (e.g., abbreviation, terminology, etc.). In scenarios like that, retrieving data from such source (text) by using SQL (Structured Query Language) queries becomes an unfeasible issue. Based on this fact, we present in this paper a tool for extracting comprehensible and standardized patients' data from unstructured data which applies Text Mining and Natural Language Processing techniques. Our main goal is to carry out an automatic process of extracting, clearing and structuring data obtained from EPRs belonging to pregnant patients from the Januario Cicco maternity hospital located in Natal - Brazil. 3,000 EPRs written in Portuguese from 2016 e 2020 were used in our comparison analysis between data manually retrieved by health professionals (e.g., doctors and nurses) and data retrieved by our tool. Moreover, we applied the Kruskal-Wallis statistical test in order to statically evaluate the obtained results between manual and automatic processes. Finally, the statistical results have showed that there was no statistical difference between the retrieval processes. In this sense, the final results were considerably promising.El registro de los datos de los pacientes en las historias clínicas electrónicas (HPE) por parte de los profesionales sanitarios suele realizarse en campos de texto libre, lo que permite diferentes formas de describir este tipo de información (p. ej., abreviatura, terminología, etc.). En escenarios como este, la recuperación de datos de dicha fuente (texto) mediante consultas SQL (Lenguaje de consulta estructurado) se convierte en un problema inviable. En base a este hecho, presentamos en este artículo una herramienta para extraer datos comprensibles y estandarizados de pacientes a partir de datos no estructurados que aplica técnicas de Minería de Texto y Procesamiento de Lenguaje Natural. Nuestro principal objetivo es realizar un proceso automático de extracción, limpieza y estructuración de datos obtenidos de PEP de gestantes en la maternidad Januário Cicco ubicada en Natal - Brasil. En nuestro análisis que compara los datos recuperados manualmente por profesionales de la salud (p. ej., médicos y enfermeras) y los datos recuperados por nuestra herramienta, se utilizaron 3000 EPR escritos en portugués. Además, aplicamos la prueba estadística de Kruskal-Wallis para evaluar estáticamente los resultados obtenidos entre procesos manuales y automáticos. Finalmente, los resultados estadísticos mostraron que no hubo diferencia estadística entre los procesos de recuperación. En este sentido, los resultados fueron considerablemente prometedores.O registro dos dados dos pacientes em prontuários eletrônicos (EPRs) pelos profissionais de saúde geralmente é realizado em campos de texto livre, permitindo diferentes formas de descrever esse tipo de informação (por exemplo, abreviatura, terminologia etc.). Em cenários como esse, recuperar dados de tal fonte (texto) usando consultas SQL (Structured Query Language) torna-se um problema inviável. Com base neste fato, apresentamos neste artigo uma ferramenta para extração de dados compreensíveis e padronizados de pacientes a partir de dados não estruturados que aplica técnicas de Mineração de Texto e Processamento de Linguagem Natural. Nosso principal objetivo é realizar um processo automático de extração, limpeza e estruturação de dados obtidos de PEPs de gestantes da maternidade Januário Cicco localizada em Natal - Brasil. Em nossa análise de comparação entre dados recuperados manualmente por profissionais de saúde (por exemplo, médicos e enfermeiros) e dados recuperados por nossa ferramenta foram usados 3.000 EPRs escritos em português. Além disso, aplicamos o teste estatístico de Kruskal-Wallis para avaliar estaticamente os resultados obtidos entre processos manuais e automáticos. Por fim, os resultados estatísticos mostraram que não houve diferença estatística entre os processos de recuperação. Nesse sentido, os resultados foram consideravelmente promissores.Research, Society and Development2022-04-30info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://rsdjournal.org/index.php/rsd/article/view/2918410.33448/rsd-v11i6.29184Research, Society and Development; Vol. 11 No. 6; e37711629184Research, Society and Development; Vol. 11 Núm. 6; e37711629184Research, Society and Development; v. 11 n. 6; e377116291842525-3409reponame:Research, Society and Developmentinstname:Universidade Federal de Itajubá (UNIFEI)instacron:UNIFEIenghttps://rsdjournal.org/index.php/rsd/article/view/29184/25309Copyright (c) 2022 Diego Henrique Pegado Benício; João Carlos Xavier Junior; Kairon Ramon Sabino de Paiva; Juliana Dantas de Araújo Santos Camargohttps://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessBenício, Diego Henrique Pegado Xavier Junior, João Carlos Paiva, Kairon Ramon Sabino de Camargo, Juliana Dantas de Araújo Santos 2022-05-13T18:04:10Zoai:ojs.pkp.sfu.ca:article/29184Revistahttps://rsdjournal.org/index.php/rsd/indexPUBhttps://rsdjournal.org/index.php/rsd/oairsd.articles@gmail.com2525-34092525-3409opendoar:2024-01-17T09:46:20.038081Research, Society and Development - Universidade Federal de Itajubá (UNIFEI)false
dc.title.none.fl_str_mv Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data
Aplicación de Minería de Texto y Procesamiento de Lenguaje Natural a Registros Médicos Electrónicos para extraer y transformar textos en datos estructurados
Aplicação de Mineração de Texto e Processamento de Linguagem Natural a Prontuários Médicos Eletrônicos para extração e transformação de textos em dados estruturados
title Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data
spellingShingle Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data
Benício, Diego Henrique Pegado
Text Mining
Natural Language Processing
Electronic Medical Record
Anamnesis.
Minería de Texto
Procesamiento del Lenguaje Natural
Historia Clínica Electrónica
Anamneses.
Mineração de Texto
Processamento de Linguagem Natural
Prontuário Eletrônico
Anamnese.
title_short Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data
title_full Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data
title_fullStr Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data
title_full_unstemmed Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data
title_sort Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data
author Benício, Diego Henrique Pegado
author_facet Benício, Diego Henrique Pegado
Xavier Junior, João Carlos
Paiva, Kairon Ramon Sabino de
Camargo, Juliana Dantas de Araújo Santos
author_role author
author2 Xavier Junior, João Carlos
Paiva, Kairon Ramon Sabino de
Camargo, Juliana Dantas de Araújo Santos
author2_role author
author
author
dc.contributor.author.fl_str_mv Benício, Diego Henrique Pegado
Xavier Junior, João Carlos
Paiva, Kairon Ramon Sabino de
Camargo, Juliana Dantas de Araújo Santos
dc.subject.por.fl_str_mv Text Mining
Natural Language Processing
Electronic Medical Record
Anamnesis.
Minería de Texto
Procesamiento del Lenguaje Natural
Historia Clínica Electrónica
Anamneses.
Mineração de Texto
Processamento de Linguagem Natural
Prontuário Eletrônico
Anamnese.
topic Text Mining
Natural Language Processing
Electronic Medical Record
Anamnesis.
Minería de Texto
Procesamiento del Lenguaje Natural
Historia Clínica Electrónica
Anamneses.
Mineração de Texto
Processamento de Linguagem Natural
Prontuário Eletrônico
Anamnese.
description The recording of patients' data in electronic patient records (EPRs) by healthcare providers is usually performed in free text fields, allowing different ways of describing that type of information (e.g., abbreviation, terminology, etc.). In scenarios like that, retrieving data from such source (text) by using SQL (Structured Query Language) queries becomes an unfeasible issue. Based on this fact, we present in this paper a tool for extracting comprehensible and standardized patients' data from unstructured data which applies Text Mining and Natural Language Processing techniques. Our main goal is to carry out an automatic process of extracting, clearing and structuring data obtained from EPRs belonging to pregnant patients from the Januario Cicco maternity hospital located in Natal - Brazil. 3,000 EPRs written in Portuguese from 2016 e 2020 were used in our comparison analysis between data manually retrieved by health professionals (e.g., doctors and nurses) and data retrieved by our tool. Moreover, we applied the Kruskal-Wallis statistical test in order to statically evaluate the obtained results between manual and automatic processes. Finally, the statistical results have showed that there was no statistical difference between the retrieval processes. In this sense, the final results were considerably promising.
publishDate 2022
dc.date.none.fl_str_mv 2022-04-30
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://rsdjournal.org/index.php/rsd/article/view/29184
10.33448/rsd-v11i6.29184
url https://rsdjournal.org/index.php/rsd/article/view/29184
identifier_str_mv 10.33448/rsd-v11i6.29184
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://rsdjournal.org/index.php/rsd/article/view/29184/25309
dc.rights.driver.fl_str_mv https://creativecommons.org/licenses/by/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Research, Society and Development
publisher.none.fl_str_mv Research, Society and Development
dc.source.none.fl_str_mv Research, Society and Development; Vol. 11 No. 6; e37711629184
Research, Society and Development; Vol. 11 Núm. 6; e37711629184
Research, Society and Development; v. 11 n. 6; e37711629184
2525-3409
reponame:Research, Society and Development
instname:Universidade Federal de Itajubá (UNIFEI)
instacron:UNIFEI
instname_str Universidade Federal de Itajubá (UNIFEI)
instacron_str UNIFEI
institution UNIFEI
reponame_str Research, Society and Development
collection Research, Society and Development
repository.name.fl_str_mv Research, Society and Development - Universidade Federal de Itajubá (UNIFEI)
repository.mail.fl_str_mv rsd.articles@gmail.com
_version_ 1797052836039622656