Automated ICD-9-CM medical coding of diabetic patient’s clinical reports
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/26001 |
Resumo: | The assignment of ICD-9-CM codes to patient’s clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in processing unstructured clinical text. This problem is here formulated as a multi-label supervised learning one where the independent variable is the report’s text and the dependent the several assigned ICD-9-CM labels. Different variations of two neural network based models, the Bag-of-Tricks and the Convolutional Neural Network (CNN) are investigated. The models are trained on the diabetic patient subset of the freely available MIMIC-III dataset. The results show that a CNN with three parallel convolutional layers achieves F1 scores of 44.51% for five digit codes and 51.73% for three digit, rolled up, codes. Additionally, it is shown that joining several binary classifiers, with the binary relevance method, produces an improvement of almost 7% over its multi-labeling equivalent in a restricted classification task of only the eleven most common labels in the dataset. |
id |
RCAP_0fbbe43092c6e43694c43ff28235896c |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/26001 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Automated ICD-9-CM medical coding of diabetic patient’s clinical reportsThe assignment of ICD-9-CM codes to patient’s clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in processing unstructured clinical text. This problem is here formulated as a multi-label supervised learning one where the independent variable is the report’s text and the dependent the several assigned ICD-9-CM labels. Different variations of two neural network based models, the Bag-of-Tricks and the Convolutional Neural Network (CNN) are investigated. The models are trained on the diabetic patient subset of the freely available MIMIC-III dataset. The results show that a CNN with three parallel convolutional layers achieves F1 scores of 44.51% for five digit codes and 51.73% for three digit, rolled up, codes. Additionally, it is shown that joining several binary classifiers, with the binary relevance method, produces an improvement of almost 7% over its multi-labeling equivalent in a restricted classification task of only the eleven most common labels in the dataset.A atribuição de códigos ICD-9-CM a relatórios clínicos de pacientes é um processo dispendioso e cansativo, realizado por pessoal médico especializado e com um custo estimado de 25 mil milhões de dólares por ano nos Estados Unidos. É uma constante ambição de investigadores desenvolver um sistema que automatize esta atribuição. No entanto, o problema mantém se irresoluto dadas as dificuldades inerentes em processar texto clínico não estruturado. Este problema é aqui formulado como um de aprendizagem supervisionada multi-label em que a variável independente é o texto do relatório e a dependente os vários códigos ICD-9-CM atribuídos. São investigadas diferentes variações de dois modelos baseados em redes neurais, o Bag-of-Tricks e a Rede Neural Convolucional (RNC). Os modelos são treinados no subconjunto de pacientes diabéticos dos dados MIMIC-III. Os resultados mostram que uma RNC com três níveis convolucionais em paralelo obtém avaliações F1 de 44.51% para códigos de cinco dígitos e 51.73% para códigos abreviados de três dígitos. Além disto, é mostrado que a combinação de vários classificadores binários num só, com o método de relevância binária, produz uma melhoria de 7% em relação ao seu equivalente multi-label, num problema de classificação limitado aos onze códigos mais comuns nos dados.2019-05-09T13:39:59Z2018-01-01T00:00:00Z2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/26001TID:202234223engPereira, Vítor Manuel de Sousainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-06T04:20:22Zoai:ria.ua.pt:10773/26001Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-06T04:20:22Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Automated ICD-9-CM medical coding of diabetic patient’s clinical reports |
title |
Automated ICD-9-CM medical coding of diabetic patient’s clinical reports |
spellingShingle |
Automated ICD-9-CM medical coding of diabetic patient’s clinical reports Pereira, Vítor Manuel de Sousa |
title_short |
Automated ICD-9-CM medical coding of diabetic patient’s clinical reports |
title_full |
Automated ICD-9-CM medical coding of diabetic patient’s clinical reports |
title_fullStr |
Automated ICD-9-CM medical coding of diabetic patient’s clinical reports |
title_full_unstemmed |
Automated ICD-9-CM medical coding of diabetic patient’s clinical reports |
title_sort |
Automated ICD-9-CM medical coding of diabetic patient’s clinical reports |
author |
Pereira, Vítor Manuel de Sousa |
author_facet |
Pereira, Vítor Manuel de Sousa |
author_role |
author |
dc.contributor.author.fl_str_mv |
Pereira, Vítor Manuel de Sousa |
description |
The assignment of ICD-9-CM codes to patient’s clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in processing unstructured clinical text. This problem is here formulated as a multi-label supervised learning one where the independent variable is the report’s text and the dependent the several assigned ICD-9-CM labels. Different variations of two neural network based models, the Bag-of-Tricks and the Convolutional Neural Network (CNN) are investigated. The models are trained on the diabetic patient subset of the freely available MIMIC-III dataset. The results show that a CNN with three parallel convolutional layers achieves F1 scores of 44.51% for five digit codes and 51.73% for three digit, rolled up, codes. Additionally, it is shown that joining several binary classifiers, with the binary relevance method, produces an improvement of almost 7% over its multi-labeling equivalent in a restricted classification task of only the eleven most common labels in the dataset. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-01-01T00:00:00Z 2018 2019-05-09T13:39:59Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/26001 TID:202234223 |
url |
http://hdl.handle.net/10773/26001 |
identifier_str_mv |
TID:202234223 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817543709183442944 |