Extracting compound terms from domain corpora
Autor(a) principal: | |
---|---|
Data de Publicação: | 2010 |
Outros Autores: | , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFRGS |
Texto Completo: | http://hdl.handle.net/10183/174302 |
Resumo: | The need for domain ontologies motivates the research on structured information extraction from texts. A foundational part of this process is the identification of domain relevant compound terms. This paper presents an evaluation of compound terms extraction from a corpus of the domain of Pediatrics. Bigrams and trigrams were automatically extracted from a corpus composed by 283 texts from a Portuguese journal, Jornal de Pediatria, using three different extraction methods. Considering that these methods generate an elevated number of candidates, we analyzed the quality of the resulting terms according to different methods and cut-off points. The evaluation is reported by metrics such as precision, recall and f-measure, which are computed on the basis of a hand-made reference list of domain relevant compounds. |
id |
UFRGS-2_920f37ce74fc3198ae67db04891f1a1b |
---|---|
oai_identifier_str |
oai:www.lume.ufrgs.br:10183/174302 |
network_acronym_str |
UFRGS-2 |
network_name_str |
Repositório Institucional da UFRGS |
repository_id_str |
|
spelling |
Lopes, LuceleneVieira, RenataFinatto, Maria José BocornyMartins, Daniel2018-04-03T02:26:06Z20100104-6500http://hdl.handle.net/10183/174302001057475The need for domain ontologies motivates the research on structured information extraction from texts. A foundational part of this process is the identification of domain relevant compound terms. This paper presents an evaluation of compound terms extraction from a corpus of the domain of Pediatrics. Bigrams and trigrams were automatically extracted from a corpus composed by 283 texts from a Portuguese journal, Jornal de Pediatria, using three different extraction methods. Considering that these methods generate an elevated number of candidates, we analyzed the quality of the resulting terms according to different methods and cut-off points. The evaluation is reported by metrics such as precision, recall and f-measure, which are computed on the basis of a hand-made reference list of domain relevant compounds.application/pdfengJournal of the Brazilian Computer Society. Rio de Janeiro, RJ. Vol. 16 (2010), p. [247]-259OntologiaTerminologiaTerm extractionStatistical and linguistic methodsOntology automatic constructionExtraction from corporaExtracting compound terms from domain corporainfo:eu-repo/semantics/articleinfo:eu-repo/semantics/otherinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSORIGINAL001057475.pdf001057475.pdfTexto completo (inglês)application/pdf505811http://www.lume.ufrgs.br/bitstream/10183/174302/1/001057475.pdfc51b687267e40ac48ec0c88dc7ceff43MD51TEXT001057475.pdf.txt001057475.pdf.txtExtracted Texttext/plain41878http://www.lume.ufrgs.br/bitstream/10183/174302/2/001057475.pdf.txtc7930720c541c7655899f827f356e1a7MD52THUMBNAIL001057475.pdf.jpg001057475.pdf.jpgGenerated Thumbnailimage/jpeg1793http://www.lume.ufrgs.br/bitstream/10183/174302/3/001057475.pdf.jpg3cdbca8217aeead8b64f6edea726e7c3MD5310183/1743022018-10-25 09:43:09.408oai:www.lume.ufrgs.br:10183/174302Repositório de PublicaçõesPUBhttps://lume.ufrgs.br/oai/requestopendoar:2018-10-25T12:43:09Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false |
dc.title.pt_BR.fl_str_mv |
Extracting compound terms from domain corpora |
title |
Extracting compound terms from domain corpora |
spellingShingle |
Extracting compound terms from domain corpora Lopes, Lucelene Ontologia Terminologia Term extraction Statistical and linguistic methods Ontology automatic construction Extraction from corpora |
title_short |
Extracting compound terms from domain corpora |
title_full |
Extracting compound terms from domain corpora |
title_fullStr |
Extracting compound terms from domain corpora |
title_full_unstemmed |
Extracting compound terms from domain corpora |
title_sort |
Extracting compound terms from domain corpora |
author |
Lopes, Lucelene |
author_facet |
Lopes, Lucelene Vieira, Renata Finatto, Maria José Bocorny Martins, Daniel |
author_role |
author |
author2 |
Vieira, Renata Finatto, Maria José Bocorny Martins, Daniel |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Lopes, Lucelene Vieira, Renata Finatto, Maria José Bocorny Martins, Daniel |
dc.subject.por.fl_str_mv |
Ontologia Terminologia |
topic |
Ontologia Terminologia Term extraction Statistical and linguistic methods Ontology automatic construction Extraction from corpora |
dc.subject.eng.fl_str_mv |
Term extraction Statistical and linguistic methods Ontology automatic construction Extraction from corpora |
description |
The need for domain ontologies motivates the research on structured information extraction from texts. A foundational part of this process is the identification of domain relevant compound terms. This paper presents an evaluation of compound terms extraction from a corpus of the domain of Pediatrics. Bigrams and trigrams were automatically extracted from a corpus composed by 283 texts from a Portuguese journal, Jornal de Pediatria, using three different extraction methods. Considering that these methods generate an elevated number of candidates, we analyzed the quality of the resulting terms according to different methods and cut-off points. The evaluation is reported by metrics such as precision, recall and f-measure, which are computed on the basis of a hand-made reference list of domain relevant compounds. |
publishDate |
2010 |
dc.date.issued.fl_str_mv |
2010 |
dc.date.accessioned.fl_str_mv |
2018-04-03T02:26:06Z |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/other |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10183/174302 |
dc.identifier.issn.pt_BR.fl_str_mv |
0104-6500 |
dc.identifier.nrb.pt_BR.fl_str_mv |
001057475 |
identifier_str_mv |
0104-6500 001057475 |
url |
http://hdl.handle.net/10183/174302 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.ispartof.pt_BR.fl_str_mv |
Journal of the Brazilian Computer Society. Rio de Janeiro, RJ. Vol. 16 (2010), p. [247]-259 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS |
instname_str |
Universidade Federal do Rio Grande do Sul (UFRGS) |
instacron_str |
UFRGS |
institution |
UFRGS |
reponame_str |
Repositório Institucional da UFRGS |
collection |
Repositório Institucional da UFRGS |
bitstream.url.fl_str_mv |
http://www.lume.ufrgs.br/bitstream/10183/174302/1/001057475.pdf http://www.lume.ufrgs.br/bitstream/10183/174302/2/001057475.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/174302/3/001057475.pdf.jpg |
bitstream.checksum.fl_str_mv |
c51b687267e40ac48ec0c88dc7ceff43 c7930720c541c7655899f827f356e1a7 3cdbca8217aeead8b64f6edea726e7c3 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS) |
repository.mail.fl_str_mv |
|
_version_ |
1801224941835649024 |