Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of results

Detalhes bibliográficos
Autor(a) principal: Koza Orellana, Walter
Data de Publicação: 2015
Tipo de documento: Artigo
Idioma: por
eng
Título da fonte: Alfa (São José do Rio Preto. Online)
Texto Completo: https://periodicos.fclar.unesp.br/alfa/article/view/6440
Resumo: The description of a method for automatic extraction of term candidates from the medical field by applying linguistic information is presented. Lexicography, morphological and syntactic rules were used. First, the detection was performed by applying a standard dictionary that assigned the tag ‘MED’ (‘MEDICAL’) to the words that could be considered terms. Morphological and syntactic rules were used to try to deduce the part of speech of the words that were not considered in the dictionary (WNCD). Afterwards, nominal phrases that included WNCD and MED were gathered to extract them as term candidates of the field. Smorph, Post Smorph Module (MPS) – both working in groups – and Xfst were the software used. Smorph performs the morphological analysis of character strings and MPS works on local grammar. Xfst is a finite state tool that works on character strings assigning previously stated categories to allow the automatic analysis of expressions. This method was tested on a section of the corpus of clinical cases collected by Burdiles (CCCM - 2009) containing 217,258 words. The results showed 92.58% of precision, 95.02% of recall and 93.78% of F-measure.
id UNESP-4_309e0c634d7e13003a489a59a5fd50c6
oai_identifier_str oai:ojs.pkp.sfu.ca:article/6440
network_acronym_str UNESP-4
network_name_str Alfa (São José do Rio Preto. Online)
repository_id_str
spelling Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of resultsPropuesta de extracción automática de candidatos a término del dominio médico procesando información lingüística. Descripción y evaluación de resultadosMedical terminologyAutomatic extractionLinguistic informationTerms candidateTerminología médicaExtracción automáticaInformación lingüísticaCandidatos a términoThe description of a method for automatic extraction of term candidates from the medical field by applying linguistic information is presented. Lexicography, morphological and syntactic rules were used. First, the detection was performed by applying a standard dictionary that assigned the tag ‘MED’ (‘MEDICAL’) to the words that could be considered terms. Morphological and syntactic rules were used to try to deduce the part of speech of the words that were not considered in the dictionary (WNCD). Afterwards, nominal phrases that included WNCD and MED were gathered to extract them as term candidates of the field. Smorph, Post Smorph Module (MPS) – both working in groups – and Xfst were the software used. Smorph performs the morphological analysis of character strings and MPS works on local grammar. Xfst is a finite state tool that works on character strings assigning previously stated categories to allow the automatic analysis of expressions. This method was tested on a section of the corpus of clinical cases collected by Burdiles (CCCM - 2009) containing 217,258 words. The results showed 92.58% of precision, 95.02% of recall and 93.78% of F-measure.Se presenta la descripción de un método de extracción automática de candidatos a términos del área médica a partir del procesamiento de información lingüística. Para ello, se trabajó con reglas en el nivel léxico, morfológico y sintáctico. En primer lugar, se realizó la detección aplicando un diccionario estándar, el cual asignó a las palabras consideradas términos, la etiqueta MED (MÉDICO). Luego, para las palabras que no estaban contempladas en el diccionario (PNCD), se dedujeron las categorías gramaticales apelando a reglas morfológicas y sintácticas. Posteriormente, se procedió a la conformación de sintagmas nominales que involucraban PNCD y MED, para extraerlos como candidatos a términos del dominio. Se utilizaron los softwares Smorph y Módulo Post Smorph (MPS), que trabajan en bloque, y Xfst. Smoprh realiza el análisis morfológico y MPS trabaja sobre gramáticas locales. Xfst, por su parte, es una herramienta de estados finitos que opera sobre cadenas de caracteres, a las que asigna categorías previamente declaradas. El método se probó en una parte del corpus de casos clínicos compilado por Burdiles (2012), que contenía 217258 palabras, y los resultados arrojaron una precisión de 92,58%, una cobertura de 95,02% y una medida f de 93,78%.UNESP2015-02-23info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfapplication/pdfhttps://periodicos.fclar.unesp.br/alfa/article/view/644010.1590/1981-5794-1502-5ALFA: Revista de Linguística; v. 59 n. 1 (2015)1981-5794reponame:Alfa (São José do Rio Preto. Online)instname:Universidade Estadual Paulista (UNESP)instacron:UNESPporenghttps://periodicos.fclar.unesp.br/alfa/article/view/6440/5252https://periodicos.fclar.unesp.br/alfa/article/view/6440/5260Copyright (c) 2015 ALFA: Revista de Linguísticainfo:eu-repo/semantics/openAccessKoza Orellana, Walter2015-04-28T23:07:57Zoai:ojs.pkp.sfu.ca:article/6440Revistahttp://www.scielo.br/scielo.php?script=sci_serial&pid=1981-5794&lng=pt&nrm=isoPUBhttps://old.scielo.br/oai/scielo-oai.phpalfa@unesp.br1981-57940002-5216opendoar:2015-04-28T23:07:57Alfa (São José do Rio Preto. Online) - Universidade Estadual Paulista (UNESP)false
dc.title.none.fl_str_mv Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of results
Propuesta de extracción automática de candidatos a término del dominio médico procesando información lingüística. Descripción y evaluación de resultados
title Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of results
spellingShingle Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of results
Koza Orellana, Walter
Medical terminology
Automatic extraction
Linguistic information
Terms candidate
Terminología médica
Extracción automática
Información lingüística
Candidatos a término
title_short Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of results
title_full Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of results
title_fullStr Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of results
title_full_unstemmed Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of results
title_sort Proposal for an automatic extraction for medical term candidates processing linguistic information. Description and evaluation of results
author Koza Orellana, Walter
author_facet Koza Orellana, Walter
author_role author
dc.contributor.author.fl_str_mv Koza Orellana, Walter
dc.subject.por.fl_str_mv Medical terminology
Automatic extraction
Linguistic information
Terms candidate
Terminología médica
Extracción automática
Información lingüística
Candidatos a término
topic Medical terminology
Automatic extraction
Linguistic information
Terms candidate
Terminología médica
Extracción automática
Información lingüística
Candidatos a término
description The description of a method for automatic extraction of term candidates from the medical field by applying linguistic information is presented. Lexicography, morphological and syntactic rules were used. First, the detection was performed by applying a standard dictionary that assigned the tag ‘MED’ (‘MEDICAL’) to the words that could be considered terms. Morphological and syntactic rules were used to try to deduce the part of speech of the words that were not considered in the dictionary (WNCD). Afterwards, nominal phrases that included WNCD and MED were gathered to extract them as term candidates of the field. Smorph, Post Smorph Module (MPS) – both working in groups – and Xfst were the software used. Smorph performs the morphological analysis of character strings and MPS works on local grammar. Xfst is a finite state tool that works on character strings assigning previously stated categories to allow the automatic analysis of expressions. This method was tested on a section of the corpus of clinical cases collected by Burdiles (CCCM - 2009) containing 217,258 words. The results showed 92.58% of precision, 95.02% of recall and 93.78% of F-measure.
publishDate 2015
dc.date.none.fl_str_mv 2015-02-23
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://periodicos.fclar.unesp.br/alfa/article/view/6440
10.1590/1981-5794-1502-5
url https://periodicos.fclar.unesp.br/alfa/article/view/6440
identifier_str_mv 10.1590/1981-5794-1502-5
dc.language.iso.fl_str_mv por
eng
language por
eng
dc.relation.none.fl_str_mv https://periodicos.fclar.unesp.br/alfa/article/view/6440/5252
https://periodicos.fclar.unesp.br/alfa/article/view/6440/5260
dc.rights.driver.fl_str_mv Copyright (c) 2015 ALFA: Revista de Linguística
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2015 ALFA: Revista de Linguística
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv UNESP
publisher.none.fl_str_mv UNESP
dc.source.none.fl_str_mv ALFA: Revista de Linguística; v. 59 n. 1 (2015)
1981-5794
reponame:Alfa (São José do Rio Preto. Online)
instname:Universidade Estadual Paulista (UNESP)
instacron:UNESP
instname_str Universidade Estadual Paulista (UNESP)
instacron_str UNESP
institution UNESP
reponame_str Alfa (São José do Rio Preto. Online)
collection Alfa (São José do Rio Preto. Online)
repository.name.fl_str_mv Alfa (São José do Rio Preto. Online) - Universidade Estadual Paulista (UNESP)
repository.mail.fl_str_mv alfa@unesp.br
_version_ 1800214376969404416