Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal

Portela, Diana; Amaral, Rita; Rodrigues, Pedro P.; Freitas, Alberto; Costa, Elísio; Fonseca, João A.; Sousa-Pinto, Bernardo

Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal

Detalhes bibliográficos
Autor(a) principal:	Portela, Diana
Data de Publicação:	2023
Outros Autores:	Amaral, Rita, Rodrigues, Pedro P. , Freitas, Alberto , Costa, Elísio , Fonseca, João A. , Sousa-Pinto, Bernardo
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10400.22/22714
Resumo:	Quantifying and dealing with lack of consistency in administrative databases (namely, under-coding) requires tracking patients longitudinally without compromising anonymity, which is often a challenging task. This study aimed to (i) assess and compare different hierarchical clustering methods on the identification of individual patients in an administrative database that does not easily allow tracking of episodes from the same patient; (ii) quantify the frequency of potential under-coding; and (iii) identify factors associated with such phenomena. We analysed the Portuguese National Hospital Morbidity Dataset, an administrative database registering all hospitalisations occurring in Mainland Portugal between 2011–2015. We applied different approaches of hierarchical clustering methods (either isolated or combined with partitional clustering methods), to identify potential individual patients based on demographic variables and comorbidities. Diagnoses codes were grouped into the Charlson an Elixhauser comorbidity defined groups. The algorithm displaying the best performance was used to quantify potential under-coding. A generalised mixed model (GML) of binomial regression was applied to assess factors associated with such potential under-coding. We observed that the hierarchical cluster analysis (HCA) + k-means clustering method with comorbidities grouped according to the Charlson defined groups was the algorithm displaying the best performance (with a Rand Index of 0.99997). We identified potential under-coding in all Charlson comorbidity groups, ranging from 3.5% (overall diabetes) to 27.7% (asthma). Overall, being male, having medical admission, dying during hospitalisation or being admitted at more specific and complex hospitals were associated with increased odds of potential under-coding. We assessed several approaches to identify individual patients in an administrative database and, subsequently, by applying HCA + k-means algorithm, we tracked coding inconsistency and potentially improved data quality. We reported consistent potential under-coding in all defined groups of comorbidities and potential factors associated with such lack of completeness. Our proposed methodological framework could both enhance data quality and act as a reference for other studies relying on databases with similar problems.

Metadados do item

id	RCAP_5fd7ff69045452b114ef72f369d00baa
oai_identifier_str	oai:recipp.ipp.pt:10400.22/22714
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in PortugalData qualityPublic health informaticsMedical records: evaluationHealth information managementQuantifying and dealing with lack of consistency in administrative databases (namely, under-coding) requires tracking patients longitudinally without compromising anonymity, which is often a challenging task. This study aimed to (i) assess and compare different hierarchical clustering methods on the identification of individual patients in an administrative database that does not easily allow tracking of episodes from the same patient; (ii) quantify the frequency of potential under-coding; and (iii) identify factors associated with such phenomena. We analysed the Portuguese National Hospital Morbidity Dataset, an administrative database registering all hospitalisations occurring in Mainland Portugal between 2011–2015. We applied different approaches of hierarchical clustering methods (either isolated or combined with partitional clustering methods), to identify potential individual patients based on demographic variables and comorbidities. Diagnoses codes were grouped into the Charlson an Elixhauser comorbidity defined groups. The algorithm displaying the best performance was used to quantify potential under-coding. A generalised mixed model (GML) of binomial regression was applied to assess factors associated with such potential under-coding. We observed that the hierarchical cluster analysis (HCA) + k-means clustering method with comorbidities grouped according to the Charlson defined groups was the algorithm displaying the best performance (with a Rand Index of 0.99997). We identified potential under-coding in all Charlson comorbidity groups, ranging from 3.5% (overall diabetes) to 27.7% (asthma). Overall, being male, having medical admission, dying during hospitalisation or being admitted at more specific and complex hospitals were associated with increased odds of potential under-coding. We assessed several approaches to identify individual patients in an administrative database and, subsequently, by applying HCA + k-means algorithm, we tracked coding inconsistency and potentially improved data quality. We reported consistent potential under-coding in all defined groups of comorbidities and potential factors associated with such lack of completeness. Our proposed methodological framework could both enhance data quality and act as a reference for other studies relying on databases with similar problems.SAGE JournalsRepositório Científico do Instituto Politécnico do PortoPortela, Diana Amaral, RitaRodrigues, Pedro P. Freitas, Alberto Costa, Elísio Fonseca, João A. Sousa-Pinto, Bernardo 2023-04-12T15:18:14Z2023-02-172023-02-17T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.22/22714engPortela D, Amaral R, Rodrigues PP, et al. Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal. Health Information Management Journal. 2023;0(0). doi:10.1177/183335832211446631833-358310.1177/183335832211446631833-3575info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-04-19T01:47:00Zoai:recipp.ipp.pt:10400.22/22714Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:49:41.549033Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal
title	Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal
spellingShingle	Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal Portela, Diana Data quality Public health informatics Medical records: evaluation Health information management
title_short	Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal
title_full	Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal
title_fullStr	Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal
title_full_unstemmed	Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal
title_sort	Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal
author	Portela, Diana
author_facet	Portela, Diana Amaral, Rita Rodrigues, Pedro P. Freitas, Alberto Costa, Elísio Fonseca, João A. Sousa-Pinto, Bernardo
author_role	author
author2	Amaral, Rita Rodrigues, Pedro P. Freitas, Alberto Costa, Elísio Fonseca, João A. Sousa-Pinto, Bernardo
author2_role	author author author author author author
dc.contributor.none.fl_str_mv	Repositório Científico do Instituto Politécnico do Porto
dc.contributor.author.fl_str_mv	Portela, Diana Amaral, Rita Rodrigues, Pedro P. Freitas, Alberto Costa, Elísio Fonseca, João A. Sousa-Pinto, Bernardo
dc.subject.por.fl_str_mv	Data quality Public health informatics Medical records: evaluation Health information management
topic	Data quality Public health informatics Medical records: evaluation Health information management
description	Quantifying and dealing with lack of consistency in administrative databases (namely, under-coding) requires tracking patients longitudinally without compromising anonymity, which is often a challenging task. This study aimed to (i) assess and compare different hierarchical clustering methods on the identification of individual patients in an administrative database that does not easily allow tracking of episodes from the same patient; (ii) quantify the frequency of potential under-coding; and (iii) identify factors associated with such phenomena. We analysed the Portuguese National Hospital Morbidity Dataset, an administrative database registering all hospitalisations occurring in Mainland Portugal between 2011–2015. We applied different approaches of hierarchical clustering methods (either isolated or combined with partitional clustering methods), to identify potential individual patients based on demographic variables and comorbidities. Diagnoses codes were grouped into the Charlson an Elixhauser comorbidity defined groups. The algorithm displaying the best performance was used to quantify potential under-coding. A generalised mixed model (GML) of binomial regression was applied to assess factors associated with such potential under-coding. We observed that the hierarchical cluster analysis (HCA) + k-means clustering method with comorbidities grouped according to the Charlson defined groups was the algorithm displaying the best performance (with a Rand Index of 0.99997). We identified potential under-coding in all Charlson comorbidity groups, ranging from 3.5% (overall diabetes) to 27.7% (asthma). Overall, being male, having medical admission, dying during hospitalisation or being admitted at more specific and complex hospitals were associated with increased odds of potential under-coding. We assessed several approaches to identify individual patients in an administrative database and, subsequently, by applying HCA + k-means algorithm, we tracked coding inconsistency and potentially improved data quality. We reported consistent potential under-coding in all defined groups of comorbidities and potential factors associated with such lack of completeness. Our proposed methodological framework could both enhance data quality and act as a reference for other studies relying on databases with similar problems.
publishDate	2023
dc.date.none.fl_str_mv	2023-04-12T15:18:14Z 2023-02-17 2023-02-17T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.22/22714
url	http://hdl.handle.net/10400.22/22714
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Portela D, Amaral R, Rodrigues PP, et al. Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal. Health Information Management Journal. 2023;0(0). doi:10.1177/18333583221144663 1833-3583 10.1177/18333583221144663 1833-3575
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	SAGE Journals
publisher.none.fl_str_mv	SAGE Journals
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799131577320472576

Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal

Registros relacionados