Automated ICD-9-CM medical coding of diabetic patient’s clinical reports

Detalhes bibliográficos
Autor(a) principal: Pereira, Vítor Manuel de Sousa
Data de Publicação: 2018
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/26001
Resumo: The assignment of ICD-9-CM codes to patient’s clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in processing unstructured clinical text. This problem is here formulated as a multi-label supervised learning one where the independent variable is the report’s text and the dependent the several assigned ICD-9-CM labels. Different variations of two neural network based models, the Bag-of-Tricks and the Convolutional Neural Network (CNN) are investigated. The models are trained on the diabetic patient subset of the freely available MIMIC-III dataset. The results show that a CNN with three parallel convolutional layers achieves F1 scores of 44.51% for five digit codes and 51.73% for three digit, rolled up, codes. Additionally, it is shown that joining several binary classifiers, with the binary relevance method, produces an improvement of almost 7% over its multi-labeling equivalent in a restricted classification task of only the eleven most common labels in the dataset.
id RCAP_0fbbe43092c6e43694c43ff28235896c
oai_identifier_str oai:ria.ua.pt:10773/26001
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Automated ICD-9-CM medical coding of diabetic patient’s clinical reportsThe assignment of ICD-9-CM codes to patient’s clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in processing unstructured clinical text. This problem is here formulated as a multi-label supervised learning one where the independent variable is the report’s text and the dependent the several assigned ICD-9-CM labels. Different variations of two neural network based models, the Bag-of-Tricks and the Convolutional Neural Network (CNN) are investigated. The models are trained on the diabetic patient subset of the freely available MIMIC-III dataset. The results show that a CNN with three parallel convolutional layers achieves F1 scores of 44.51% for five digit codes and 51.73% for three digit, rolled up, codes. Additionally, it is shown that joining several binary classifiers, with the binary relevance method, produces an improvement of almost 7% over its multi-labeling equivalent in a restricted classification task of only the eleven most common labels in the dataset.A atribuição de códigos ICD-9-CM a relatórios clínicos de pacientes é um processo dispendioso e cansativo, realizado por pessoal médico especializado e com um custo estimado de 25 mil milhões de dólares por ano nos Estados Unidos. É uma constante ambição de investigadores desenvolver um sistema que automatize esta atribuição. No entanto, o problema mantém se irresoluto dadas as dificuldades inerentes em processar texto clínico não estruturado. Este problema é aqui formulado como um de aprendizagem supervisionada multi-label em que a variável independente é o texto do relatório e a dependente os vários códigos ICD-9-CM atribuídos. São investigadas diferentes variações de dois modelos baseados em redes neurais, o Bag-of-Tricks e a Rede Neural Convolucional (RNC). Os modelos são treinados no subconjunto de pacientes diabéticos dos dados MIMIC-III. Os resultados mostram que uma RNC com três níveis convolucionais em paralelo obtém avaliações F1 de 44.51% para códigos de cinco dígitos e 51.73% para códigos abreviados de três dígitos. Além disto, é mostrado que a combinação de vários classificadores binários num só, com o método de relevância binária, produz uma melhoria de 7% em relação ao seu equivalente multi-label, num problema de classificação limitado aos onze códigos mais comuns nos dados.2019-05-09T13:39:59Z2018-01-01T00:00:00Z2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/26001TID:202234223engPereira, Vítor Manuel de Sousainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:50:23Zoai:ria.ua.pt:10773/26001Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:59:07.086554Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Automated ICD-9-CM medical coding of diabetic patient’s clinical reports
title Automated ICD-9-CM medical coding of diabetic patient’s clinical reports
spellingShingle Automated ICD-9-CM medical coding of diabetic patient’s clinical reports
Pereira, Vítor Manuel de Sousa
title_short Automated ICD-9-CM medical coding of diabetic patient’s clinical reports
title_full Automated ICD-9-CM medical coding of diabetic patient’s clinical reports
title_fullStr Automated ICD-9-CM medical coding of diabetic patient’s clinical reports
title_full_unstemmed Automated ICD-9-CM medical coding of diabetic patient’s clinical reports
title_sort Automated ICD-9-CM medical coding of diabetic patient’s clinical reports
author Pereira, Vítor Manuel de Sousa
author_facet Pereira, Vítor Manuel de Sousa
author_role author
dc.contributor.author.fl_str_mv Pereira, Vítor Manuel de Sousa
description The assignment of ICD-9-CM codes to patient’s clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in processing unstructured clinical text. This problem is here formulated as a multi-label supervised learning one where the independent variable is the report’s text and the dependent the several assigned ICD-9-CM labels. Different variations of two neural network based models, the Bag-of-Tricks and the Convolutional Neural Network (CNN) are investigated. The models are trained on the diabetic patient subset of the freely available MIMIC-III dataset. The results show that a CNN with three parallel convolutional layers achieves F1 scores of 44.51% for five digit codes and 51.73% for three digit, rolled up, codes. Additionally, it is shown that joining several binary classifiers, with the binary relevance method, produces an improvement of almost 7% over its multi-labeling equivalent in a restricted classification task of only the eleven most common labels in the dataset.
publishDate 2018
dc.date.none.fl_str_mv 2018-01-01T00:00:00Z
2018
2019-05-09T13:39:59Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/26001
TID:202234223
url http://hdl.handle.net/10773/26001
identifier_str_mv TID:202234223
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137645605945344