Extracting compound terms from domain corpora

Detalhes bibliográficos
Autor(a) principal: Lopes, Lucelene
Data de Publicação: 2010
Outros Autores: Vieira, Renata, Finatto, Maria José Bocorny, Martins, Daniel
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFRGS
Texto Completo: http://hdl.handle.net/10183/174302
Resumo: The need for domain ontologies motivates the research on structured information extraction from texts. A foundational part of this process is the identification of domain relevant compound terms. This paper presents an evaluation of compound terms extraction from a corpus of the domain of Pediatrics. Bigrams and trigrams were automatically extracted from a corpus composed by 283 texts from a Portuguese journal, Jornal de Pediatria, using three different extraction methods. Considering that these methods generate an elevated number of candidates, we analyzed the quality of the resulting terms according to different methods and cut-off points. The evaluation is reported by metrics such as precision, recall and f-measure, which are computed on the basis of a hand-made reference list of domain relevant compounds.
id UFRGS-2_920f37ce74fc3198ae67db04891f1a1b
oai_identifier_str oai:www.lume.ufrgs.br:10183/174302
network_acronym_str UFRGS-2
network_name_str Repositório Institucional da UFRGS
repository_id_str
spelling Lopes, LuceleneVieira, RenataFinatto, Maria José BocornyMartins, Daniel2018-04-03T02:26:06Z20100104-6500http://hdl.handle.net/10183/174302001057475The need for domain ontologies motivates the research on structured information extraction from texts. A foundational part of this process is the identification of domain relevant compound terms. This paper presents an evaluation of compound terms extraction from a corpus of the domain of Pediatrics. Bigrams and trigrams were automatically extracted from a corpus composed by 283 texts from a Portuguese journal, Jornal de Pediatria, using three different extraction methods. Considering that these methods generate an elevated number of candidates, we analyzed the quality of the resulting terms according to different methods and cut-off points. The evaluation is reported by metrics such as precision, recall and f-measure, which are computed on the basis of a hand-made reference list of domain relevant compounds.application/pdfengJournal of the Brazilian Computer Society. Rio de Janeiro, RJ. Vol. 16 (2010), p. [247]-259OntologiaTerminologiaTerm extractionStatistical and linguistic methodsOntology automatic constructionExtraction from corporaExtracting compound terms from domain corporainfo:eu-repo/semantics/articleinfo:eu-repo/semantics/otherinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSORIGINAL001057475.pdf001057475.pdfTexto completo (inglês)application/pdf505811http://www.lume.ufrgs.br/bitstream/10183/174302/1/001057475.pdfc51b687267e40ac48ec0c88dc7ceff43MD51TEXT001057475.pdf.txt001057475.pdf.txtExtracted Texttext/plain41878http://www.lume.ufrgs.br/bitstream/10183/174302/2/001057475.pdf.txtc7930720c541c7655899f827f356e1a7MD52THUMBNAIL001057475.pdf.jpg001057475.pdf.jpgGenerated Thumbnailimage/jpeg1793http://www.lume.ufrgs.br/bitstream/10183/174302/3/001057475.pdf.jpg3cdbca8217aeead8b64f6edea726e7c3MD5310183/1743022018-10-25 09:43:09.408oai:www.lume.ufrgs.br:10183/174302Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2018-10-25T12:43:09Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv Extracting compound terms from domain corpora
title Extracting compound terms from domain corpora
spellingShingle Extracting compound terms from domain corpora
Lopes, Lucelene
Ontologia
Terminologia
Term extraction
Statistical and linguistic methods
Ontology automatic construction
Extraction from corpora
title_short Extracting compound terms from domain corpora
title_full Extracting compound terms from domain corpora
title_fullStr Extracting compound terms from domain corpora
title_full_unstemmed Extracting compound terms from domain corpora
title_sort Extracting compound terms from domain corpora
author Lopes, Lucelene
author_facet Lopes, Lucelene
Vieira, Renata
Finatto, Maria José Bocorny
Martins, Daniel
author_role author
author2 Vieira, Renata
Finatto, Maria José Bocorny
Martins, Daniel
author2_role author
author
author
dc.contributor.author.fl_str_mv Lopes, Lucelene
Vieira, Renata
Finatto, Maria José Bocorny
Martins, Daniel
dc.subject.por.fl_str_mv Ontologia
Terminologia
topic Ontologia
Terminologia
Term extraction
Statistical and linguistic methods
Ontology automatic construction
Extraction from corpora
dc.subject.eng.fl_str_mv Term extraction
Statistical and linguistic methods
Ontology automatic construction
Extraction from corpora
description The need for domain ontologies motivates the research on structured information extraction from texts. A foundational part of this process is the identification of domain relevant compound terms. This paper presents an evaluation of compound terms extraction from a corpus of the domain of Pediatrics. Bigrams and trigrams were automatically extracted from a corpus composed by 283 texts from a Portuguese journal, Jornal de Pediatria, using three different extraction methods. Considering that these methods generate an elevated number of candidates, we analyzed the quality of the resulting terms according to different methods and cut-off points. The evaluation is reported by metrics such as precision, recall and f-measure, which are computed on the basis of a hand-made reference list of domain relevant compounds.
publishDate 2010
dc.date.issued.fl_str_mv 2010
dc.date.accessioned.fl_str_mv 2018-04-03T02:26:06Z
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/other
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10183/174302
dc.identifier.issn.pt_BR.fl_str_mv 0104-6500
dc.identifier.nrb.pt_BR.fl_str_mv 001057475
identifier_str_mv 0104-6500
001057475
url http://hdl.handle.net/10183/174302
dc.language.iso.fl_str_mv eng
language eng
dc.relation.ispartof.pt_BR.fl_str_mv Journal of the Brazilian Computer Society. Rio de Janeiro, RJ. Vol. 16 (2010), p. [247]-259
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFRGS
instname:Universidade Federal do Rio Grande do Sul (UFRGS)
instacron:UFRGS
instname_str Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str UFRGS
institution UFRGS
reponame_str Repositório Institucional da UFRGS
collection Repositório Institucional da UFRGS
bitstream.url.fl_str_mv http://www.lume.ufrgs.br/bitstream/10183/174302/1/001057475.pdf
http://www.lume.ufrgs.br/bitstream/10183/174302/2/001057475.pdf.txt
http://www.lume.ufrgs.br/bitstream/10183/174302/3/001057475.pdf.jpg
bitstream.checksum.fl_str_mv c51b687267e40ac48ec0c88dc7ceff43
c7930720c541c7655899f827f356e1a7
3cdbca8217aeead8b64f6edea726e7c3
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv
_version_ 1801224941835649024