Vocabulário de saúde do consumidor em idioma português

Detalhes bibliográficos
Autor(a) principal: Tenorio, Josceli Maria [UNIFESP]
Data de Publicação: 2019
Tipo de documento: Tese
Idioma: por
Título da fonte: Repositório Institucional da UNIFESP
Texto Completo: https://sucupira.capes.gov.br/sucupira/public/consultas/coleta/trabalhoConclusao/viewTrabalhoConclusao.jsf?popup=true&id_trabalho=9178759
https://repositorio.unifesp.br/handle/11600/59994
Resumo: Introduction: Some research studies show a distant language gap between the common terms used by laypersons and the technical terms used by healthcare professionals. Thus, a proposed solution to this language gap barrier is the consumer health vocabularies (CHV) index, where could be incorporated technologies which makes the data available, integrated as well as semantic relationships between themselves. Objective: Developing a Brazilian Portuguese CHV model based on web data sources, and structured according to semantic web vocabulary principles and technologies. Method: This study was split into three distinct phases. In Phase1, we have collected and extracted terms from some web-structured data sources, such as the Unified Medical Language System (UMLS) controlled vocabularies and the DBpedia Knowledge Base. These terms and their semantic relationships have represented by a complex network. Some network centrality measures have been obtained in order to characterise it. The selection of terms which could compose the CHV was performed through clustering network techniques. Phase 2 was conducted based on two steps in order to obtain new terms from unstructured web data sources written by and/or for health consumers, composed by recognition of UMLS’ terms and use of term automatic recognition techniques in order to identify candidate terms. A human validation process was conducted in order to approve these candidate terms and insert them into the CHV. In Phase 3 the CHV data have formalised and have represented by the Resource Description Framework (RDF) web data model. Furthermore, we designed and developed a layout to access the dataset by users. Results: Phase 1 resulted into a complex network containing already 146,956 terms linked by semantic relationships as synonyms, hyperonymy, and related terms, of which 31,439 are UMLS concepts, represented by preferred terms and 83,279 are synonyms. DBpedia have raised the synonym per concept rate from 1.6 to 2.5. Centrality measures were important to show some characteristics of the complex network in order to reveal the most important terms. Phase 2 has resulted in the automatic recognition of 5,916 UMLS' terms. The term automatic recognition algorithm allowed recognizes 9,674 n-grams candidates. Human validation has validated 6,245 terms, around 66.24% of the candidate terms assessed. The precision-recall curve of the algorithm that performed the automatic term recognition resulted in [0.732- ~ 0.900], a greater value than founded by other similar studies. In Phase 3, we formalized these data using Simple Knowledge Organization System (SKOS) data model and Provenance, Authoring and Versioning (PAV) ontology, suitable for CHV and supporting RDF data model. The CHV-RDF contains already 150,995 terms, which of 66,992 are preferred terms, and 84,003 are synonyms, besides the mapping of other semantic relationships between terms based on hierarchy and association. Conclusion: It was possible to build a CHV model automatically through computational techniques using data sources available on the web. The complex network model enabled to link and match terms provided by controlled and consumer vocabularies, represent their semantic relationships, and it has supported the CHV-RDF data model. Unpublished synonyms, terms and relationships have been identified. This study showed a data infrastructure which could be used for the development of consumer-oriented applications and proposed a method to development of health vocabularies in other language and updating existing vocabularies.
id UFSP_c233cacdf30a30350adad58a83044bbc
oai_identifier_str oai:repositorio.unifesp.br/:11600/59994
network_acronym_str UFSP
network_name_str Repositório Institucional da UNIFESP
repository_id_str 3465
spelling Vocabulário de saúde do consumidor em idioma portuguêsConsumer health vocabulary in Brazilian Portuguese languageConsumer Health VocabularyVocabulário De Saúde Do ConsumidorIntroduction: Some research studies show a distant language gap between the common terms used by laypersons and the technical terms used by healthcare professionals. Thus, a proposed solution to this language gap barrier is the consumer health vocabularies (CHV) index, where could be incorporated technologies which makes the data available, integrated as well as semantic relationships between themselves. Objective: Developing a Brazilian Portuguese CHV model based on web data sources, and structured according to semantic web vocabulary principles and technologies. Method: This study was split into three distinct phases. In Phase1, we have collected and extracted terms from some web-structured data sources, such as the Unified Medical Language System (UMLS) controlled vocabularies and the DBpedia Knowledge Base. These terms and their semantic relationships have represented by a complex network. Some network centrality measures have been obtained in order to characterise it. The selection of terms which could compose the CHV was performed through clustering network techniques. Phase 2 was conducted based on two steps in order to obtain new terms from unstructured web data sources written by and/or for health consumers, composed by recognition of UMLS’ terms and use of term automatic recognition techniques in order to identify candidate terms. A human validation process was conducted in order to approve these candidate terms and insert them into the CHV. In Phase 3 the CHV data have formalised and have represented by the Resource Description Framework (RDF) web data model. Furthermore, we designed and developed a layout to access the dataset by users. Results: Phase 1 resulted into a complex network containing already 146,956 terms linked by semantic relationships as synonyms, hyperonymy, and related terms, of which 31,439 are UMLS concepts, represented by preferred terms and 83,279 are synonyms. DBpedia have raised the synonym per concept rate from 1.6 to 2.5. Centrality measures were important to show some characteristics of the complex network in order to reveal the most important terms. Phase 2 has resulted in the automatic recognition of 5,916 UMLS' terms. The term automatic recognition algorithm allowed recognizes 9,674 n-grams candidates. Human validation has validated 6,245 terms, around 66.24% of the candidate terms assessed. The precision-recall curve of the algorithm that performed the automatic term recognition resulted in [0.732- ~ 0.900], a greater value than founded by other similar studies. In Phase 3, we formalized these data using Simple Knowledge Organization System (SKOS) data model and Provenance, Authoring and Versioning (PAV) ontology, suitable for CHV and supporting RDF data model. The CHV-RDF contains already 150,995 terms, which of 66,992 are preferred terms, and 84,003 are synonyms, besides the mapping of other semantic relationships between terms based on hierarchy and association. Conclusion: It was possible to build a CHV model automatically through computational techniques using data sources available on the web. The complex network model enabled to link and match terms provided by controlled and consumer vocabularies, represent their semantic relationships, and it has supported the CHV-RDF data model. Unpublished synonyms, terms and relationships have been identified. This study showed a data infrastructure which could be used for the development of consumer-oriented applications and proposed a method to development of health vocabularies in other language and updating existing vocabularies.Introdução: Estudos mostram lacunas entre os termos usados por consumidores e termos técnicos usados por profissionais de saúde. Os vocabulários de saúde do consumidor (VSC) são apresentados como uma solução em especial incorporam tecnologias que possibilitam a disponibilização e integração de conteúdo e de relações semânticas entre os termos. Objetivo: Desenvolver um VSC em idioma português baseado em conteúdos disponíveis na web e estruturado segundo os princípios e tecnologias da web semântica. Método: Este estudo foi dividido em três fases. Na Fase 1 foram coletados e extraídos termos de conteúdos estruturados disponíveis na web, como os vocabulários controlados do Unified Medical Language System (UMLS) e a base de conhecimento DBpedia. Esses termos e suas relações semânticas foram representados por meio de rede complexa. Medidas de centralidade foram efetuadas para caracterização da rede. A seleção dos termos para compor o VSC foi realizada por meio da aplicação de técnicas de clusterização da rede. A Fase 2 foi composta por duas estratégias aplicadas para incluir termos não obtidos na Fase 1 a partir da análise de conteúdos não estruturados escritos por ou para consumidores em saúde, sendo a recuperação de termos dos vocabulários do UMLS e a utilização de técnicas para reconhecimento automático para identificar termos candidatos. Um processo de validação humana foi aplicado para homologar os termos candidatos para inserção no VSC. Na Fase 3 o conteúdo do VSC foi formalizado e representado pelo modelo Resource Description Framework (RDF). Uma interface web foi construída para acesso aos dados. Resultados: A Fase 1 resultou em uma rede complexa composta por 146.956 termos ligados por relações semânticas como sinonímia, hiperonímia, além de termos relacionados, dos quais 31.439 são conceitos do UMLS representados por termos preferidos e 83.279 são sinônimos. A DBpedia contribuiu com a elevação da taxa sinônimo/conceito de 1,6 para 2,5. Medidas de centralidade mostraram características da rede revelando termos significativos. A Fase 2 resultou na recuperação de 5.916 termos dos vocabulários do UMLS. O algoritmo para reconhecimento automático de termos resultou na obtenção de 9.674 n-gramas candidatos. A validação humana resultou em 6.245 termos admissíveis ao VSC (66,24% dos termos avaliados). A curva de precisão-revocação do algoritmo resultou em [0,732-~0,900], um valor superior ao apresentado em estudos análogos. Na Fase 3 os dados foram formalizados usando o modelo de dados Simple Knowledge Organization System (SKOS) e a ontologia Provenance, Authoring and Versioning (PAV) que mostraram-se adequados para formalizar o conteúdo do VSC e suportar o modelo RDF. O modelo VSC-RDF foi construído com 150.995 termos dos quais 66.992 são termos preferidos e 84.003 sinônimos, além do mapeamento de outros relacionamentos semânticos baseados em hierarquia e associação. Conclusão: A partir dos dados disponibilizados na web foi possível construir um modelo de VSC por meio da aplicação de técnicas computacionais. O modelo de rede possibilita ligar termos provenientes de vocabulários controlados e do consumidor, representar seus relacionamentos semânticos e foi a base para a construção do modelo VSC-RDF. Sinônimos, termos e relações inéditos foram identificados. Este estudo apresentou uma infraestrutura de dados para o desenvolvimento de aplicações orientadas ao consumidor e propõe um método de desenvolvimento de vocabulários de saúde em outros idiomas e de atualização de vocabulários existentes.Dados abertos - Sucupira - Teses e dissertações (2019)Universidade Federal de São Paulo (UNIFESP)Pisa, Ivan Torres [UNIFESP]http://lattes.cnpq.br/2841925497526792http://lattes.cnpq.br/6362966376837352Universidade Federal de São Paulo (UNIFESP)Tenorio, Josceli Maria [UNIFESP]2021-01-19T16:37:33Z2021-01-19T16:37:33Z2019-11-13info:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://sucupira.capes.gov.br/sucupira/public/consultas/coleta/trabalhoConclusao/viewTrabalhoConclusao.jsf?popup=true&id_trabalho=9178759TENÓRIO, Josceli Maria. Vocabulário de saúde do consumidor em idiona português 2019. 131f. Tese (Doutorado em Gestão e Informática em Saúde) – Escola Paulista de Medicina, Universidade Federal de São Paulo. São Paulo, 2019.Josceli Maria Tenório-A.pdfhttps://repositorio.unifesp.br/handle/11600/59994porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UNIFESPinstname:Universidade Federal de São Paulo (UNIFESP)instacron:UNIFESP2024-07-31T18:50:11Zoai:repositorio.unifesp.br/:11600/59994Repositório InstitucionalPUBhttp://www.repositorio.unifesp.br/oai/requestbiblioteca.csp@unifesp.bropendoar:34652024-07-31T18:50:11Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)false
dc.title.none.fl_str_mv Vocabulário de saúde do consumidor em idioma português
Consumer health vocabulary in Brazilian Portuguese language
title Vocabulário de saúde do consumidor em idioma português
spellingShingle Vocabulário de saúde do consumidor em idioma português
Tenorio, Josceli Maria [UNIFESP]
Consumer Health Vocabulary
Vocabulário De Saúde Do Consumidor
title_short Vocabulário de saúde do consumidor em idioma português
title_full Vocabulário de saúde do consumidor em idioma português
title_fullStr Vocabulário de saúde do consumidor em idioma português
title_full_unstemmed Vocabulário de saúde do consumidor em idioma português
title_sort Vocabulário de saúde do consumidor em idioma português
author Tenorio, Josceli Maria [UNIFESP]
author_facet Tenorio, Josceli Maria [UNIFESP]
author_role author
dc.contributor.none.fl_str_mv Pisa, Ivan Torres [UNIFESP]
http://lattes.cnpq.br/2841925497526792
http://lattes.cnpq.br/6362966376837352
Universidade Federal de São Paulo (UNIFESP)
dc.contributor.author.fl_str_mv Tenorio, Josceli Maria [UNIFESP]
dc.subject.por.fl_str_mv Consumer Health Vocabulary
Vocabulário De Saúde Do Consumidor
topic Consumer Health Vocabulary
Vocabulário De Saúde Do Consumidor
description Introduction: Some research studies show a distant language gap between the common terms used by laypersons and the technical terms used by healthcare professionals. Thus, a proposed solution to this language gap barrier is the consumer health vocabularies (CHV) index, where could be incorporated technologies which makes the data available, integrated as well as semantic relationships between themselves. Objective: Developing a Brazilian Portuguese CHV model based on web data sources, and structured according to semantic web vocabulary principles and technologies. Method: This study was split into three distinct phases. In Phase1, we have collected and extracted terms from some web-structured data sources, such as the Unified Medical Language System (UMLS) controlled vocabularies and the DBpedia Knowledge Base. These terms and their semantic relationships have represented by a complex network. Some network centrality measures have been obtained in order to characterise it. The selection of terms which could compose the CHV was performed through clustering network techniques. Phase 2 was conducted based on two steps in order to obtain new terms from unstructured web data sources written by and/or for health consumers, composed by recognition of UMLS’ terms and use of term automatic recognition techniques in order to identify candidate terms. A human validation process was conducted in order to approve these candidate terms and insert them into the CHV. In Phase 3 the CHV data have formalised and have represented by the Resource Description Framework (RDF) web data model. Furthermore, we designed and developed a layout to access the dataset by users. Results: Phase 1 resulted into a complex network containing already 146,956 terms linked by semantic relationships as synonyms, hyperonymy, and related terms, of which 31,439 are UMLS concepts, represented by preferred terms and 83,279 are synonyms. DBpedia have raised the synonym per concept rate from 1.6 to 2.5. Centrality measures were important to show some characteristics of the complex network in order to reveal the most important terms. Phase 2 has resulted in the automatic recognition of 5,916 UMLS' terms. The term automatic recognition algorithm allowed recognizes 9,674 n-grams candidates. Human validation has validated 6,245 terms, around 66.24% of the candidate terms assessed. The precision-recall curve of the algorithm that performed the automatic term recognition resulted in [0.732- ~ 0.900], a greater value than founded by other similar studies. In Phase 3, we formalized these data using Simple Knowledge Organization System (SKOS) data model and Provenance, Authoring and Versioning (PAV) ontology, suitable for CHV and supporting RDF data model. The CHV-RDF contains already 150,995 terms, which of 66,992 are preferred terms, and 84,003 are synonyms, besides the mapping of other semantic relationships between terms based on hierarchy and association. Conclusion: It was possible to build a CHV model automatically through computational techniques using data sources available on the web. The complex network model enabled to link and match terms provided by controlled and consumer vocabularies, represent their semantic relationships, and it has supported the CHV-RDF data model. Unpublished synonyms, terms and relationships have been identified. This study showed a data infrastructure which could be used for the development of consumer-oriented applications and proposed a method to development of health vocabularies in other language and updating existing vocabularies.
publishDate 2019
dc.date.none.fl_str_mv 2019-11-13
2021-01-19T16:37:33Z
2021-01-19T16:37:33Z
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://sucupira.capes.gov.br/sucupira/public/consultas/coleta/trabalhoConclusao/viewTrabalhoConclusao.jsf?popup=true&id_trabalho=9178759
TENÓRIO, Josceli Maria. Vocabulário de saúde do consumidor em idiona português 2019. 131f. Tese (Doutorado em Gestão e Informática em Saúde) – Escola Paulista de Medicina, Universidade Federal de São Paulo. São Paulo, 2019.
Josceli Maria Tenório-A.pdf
https://repositorio.unifesp.br/handle/11600/59994
url https://sucupira.capes.gov.br/sucupira/public/consultas/coleta/trabalhoConclusao/viewTrabalhoConclusao.jsf?popup=true&id_trabalho=9178759
https://repositorio.unifesp.br/handle/11600/59994
identifier_str_mv TENÓRIO, Josceli Maria. Vocabulário de saúde do consumidor em idiona português 2019. 131f. Tese (Doutorado em Gestão e Informática em Saúde) – Escola Paulista de Medicina, Universidade Federal de São Paulo. São Paulo, 2019.
Josceli Maria Tenório-A.pdf
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de São Paulo (UNIFESP)
publisher.none.fl_str_mv Universidade Federal de São Paulo (UNIFESP)
dc.source.none.fl_str_mv reponame:Repositório Institucional da UNIFESP
instname:Universidade Federal de São Paulo (UNIFESP)
instacron:UNIFESP
instname_str Universidade Federal de São Paulo (UNIFESP)
instacron_str UNIFESP
institution UNIFESP
reponame_str Repositório Institucional da UNIFESP
collection Repositório Institucional da UNIFESP
repository.name.fl_str_mv Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)
repository.mail.fl_str_mv biblioteca.csp@unifesp.br
_version_ 1814268302433714176