DptOIE: a portuguese Open Information Extraction system based on dependency analysis

Detalhes bibliográficos
Autor(a) principal: Oliveira, Leandro de
Data de Publicação: 2019
Outros Autores: Claro, Daniela Barreiro
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFBA
Texto Completo: http://repositorio.ufba.br/ri/handle/ri/30719
Resumo: Em fase de submissão a um periódico
id UFBA-2_258e253e79ee236f9bcf581e50687a60
oai_identifier_str oai:repositorio.ufba.br:ri/30719
network_acronym_str UFBA-2
network_name_str Repositório Institucional da UFBA
repository_id_str 1932
spelling Oliveira, Leandro deClaro, Daniela Barreiro2019-10-09T20:50:05Z2019-10-09T20:50:05Z2019-10-09http://repositorio.ufba.br/ri/handle/ri/30719Em fase de submissão a um periódicoIt is estimated that more than 80% of the information on the Web is stored in textual form. For humans, the task of extracting useful information from data that comes up daily is difficult. In order to automate the process, techniques of Open Information Extraction (OIE) methods, which are capable of extracting facts from large textual bases, have been proposed. At first, most OIE methods were developed for the English language. However, other languages, such as Portuguese, have tackled special attention, since it covers approximately $2.5\%$ of all content available on websites. For English languages, methods based on hand-crafted rules and dependency analysis have gained good results. Nevertheless, methods based on similar approaches, in Portuguese, have not presented equivalent performance. We believe that the rules defined are generic and do not cover specific aspects of the language. For this reason, our DptOIE method defined a new set of hand-craft rules and explore sentences through a dependency analysis by a depth-first search (DFS) approach. DptOIE was compared against two other OIE methods which extract facts in Portuguese: PragmaticOIE and ArgOE. DptOIE outstands the other works, obtaining a greater area under the precision-yield curve. Precision was superior as well as the number of coherent facts extracts. As far as we know, this is the most outperforming method to extract fact on OIE for the Portuguese language.Submitted by Barreiro Claro Daniela (dclaro@ufba.br) on 2019-08-26T16:49:28Z No. of bitstreams: 1 DptOIE_Leandro_Linguamatica.pdf: 886971 bytes, checksum: bef14519f5d1d73c2985cab745f26079 (MD5)Approved for entry into archive by Solange Rocha (soluny@gmail.com) on 2019-10-09T20:50:05Z (GMT) No. of bitstreams: 1 DptOIE_Leandro_Linguamatica.pdf: 886971 bytes, checksum: bef14519f5d1d73c2985cab745f26079 (MD5)Made available in DSpace on 2019-10-09T20:50:05Z (GMT). No. of bitstreams: 1 DptOIE_Leandro_Linguamatica.pdf: 886971 bytes, checksum: bef14519f5d1d73c2985cab745f26079 (MD5)FAPESB/CAPESSalvadorOpen Information ExtractionDependency analysisDepth-first searchDptOIE: a portuguese Open Information Extraction system based on dependency analysisinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleinfo:eu-repo/semantics/openAccessengreponame:Repositório Institucional da UFBAinstname:Universidade Federal da Bahia (UFBA)instacron:UFBAORIGINALDptOIE_Leandro_Linguamatica.pdfDptOIE_Leandro_Linguamatica.pdfapplication/pdf886971https://repositorio.ufba.br/bitstream/ri/30719/1/DptOIE_Leandro_Linguamatica.pdfbef14519f5d1d73c2985cab745f26079MD51LICENSElicense.txtlicense.txttext/plain1582https://repositorio.ufba.br/bitstream/ri/30719/2/license.txt907e2b7d511fb2c3e42dbdd41a6197c6MD52TEXTDptOIE_Leandro_Linguamatica.pdf.txtDptOIE_Leandro_Linguamatica.pdf.txtExtracted texttext/plain67299https://repositorio.ufba.br/bitstream/ri/30719/3/DptOIE_Leandro_Linguamatica.pdf.txt285dc9469fe519d5141cef9f240cf01dMD53ri/307192022-02-21 00:10:21.306oai:repositorio.ufba.br:ri/30719VGVybW8gZGUgTGljZW7Dp2EsIG7Do28gZXhjbHVzaXZvLCBwYXJhIG8gZGVww7NzaXRvIG5vIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGQkEuCgogUGVsbyBwcm9jZXNzbyBkZSBzdWJtaXNzw4PCg8OCwqNvIGRlIGRvY3VtZW50b3MsIG8gYXV0b3Igb3Ugc2V1IHJlcHJlc2VudGFudGUgbGVnYWwsIGFvIGFjZWl0YXIgZXNzZSB0ZXJtbyBkZSBsaWNlbsODwoPDgsKnYSwgY29uY2VkZSBhbyBSZXBvc2l0w4PCg8OCwrNyaW8gSW5zdGl0dWNpb25hbCBkYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkYSBCYWhpYSBvIGRpcmVpdG8gZGUgbWFudGVyIHVtYSBjw4PCg8OCwrNwaWEgZW0gc2V1IHJlcG9zaXTDg8KDw4LCs3JpbyBjb20gYSBmaW5hbGlkYWRlLCBwcmltZWlyYSwgZGUgcHJlc2VydmHDg8KDw4LCp8ODwoPDgsKjby4gCgpFc3NlcyB0ZXJtb3MsIG7Dg8KDw4LCo28gZXhjbHVzaXZvcywgbWFudMODwoPDgsKpbSBvcyBkaXJlaXRvcyBkZSBhdXRvci9jb3B5cmlnaHQsIG1hcyBlbnRlbmRlIG8gZG9jdW1lbnRvIGNvbW8gcGFydGUgZG8gYWNlcnZvIGludGVsZWN0dWFsIGRlc3NhIFVuaXZlcnNpZGFkZS4KCiBQYXJhIG9zIGRvY3VtZW50b3MgcHVibGljYWRvcyBjb20gcmVwYXNzZSBkZSBkaXJlaXRvcyBkZSBkaXN0cmlidWnDg8KDw4LCp8ODwoPDgsKjbywgZXNzZSB0ZXJtbyBkZSBsaWNlbsODwoPDgsKnYSBlbnRlbmRlIHF1ZToKCiBNYW50ZW5kbyBvcyBkaXJlaXRvcyBhdXRvcmFpcywgcmVwYXNzYWRvcyBhIHRlcmNlaXJvcywgZW0gY2FzbyBkZSBwdWJsaWNhw4PCg8OCwqfDg8KDw4LCtWVzLCBvIHJlcG9zaXTDg8KDw4LCs3JpbyBwb2RlIHJlc3RyaW5naXIgbyBhY2Vzc28gYW8gdGV4dG8gaW50ZWdyYWwsIG1hcyBsaWJlcmEgYXMgaW5mb3JtYcODwoPDgsKnw4PCg8OCwrVlcyBzb2JyZSBvIGRvY3VtZW50byAoTWV0YWRhZG9zIGRlc2NyaXRpdm9zKS4KCiBEZXN0YSBmb3JtYSwgYXRlbmRlbmRvIGFvcyBhbnNlaW9zIGRlc3NhIHVuaXZlcnNpZGFkZSBlbSBtYW50ZXIgc3VhIHByb2R1w4PCg8OCwqfDg8KDw4LCo28gY2llbnTDg8KDw4LCrWZpY2EgY29tIGFzIHJlc3RyacODwoPDgsKnw4PCg8OCwrVlcyBpbXBvc3RhcyBwZWxvcyBlZGl0b3JlcyBkZSBwZXJpw4PCg8OCwrNkaWNvcy4KCiBQYXJhIGFzIHB1YmxpY2HDg8KDw4LCp8ODwoPDgsK1ZXMgc2VtIGluaWNpYXRpdmFzIHF1ZSBzZWd1ZW0gYSBwb2zDg8KDw4LCrXRpY2EgZGUgQWNlc3NvIEFiZXJ0bywgb3MgZGVww4PCg8OCwrNzaXRvcyBjb21wdWxzw4PCg8OCwrNyaW9zIG5lc3NlIHJlcG9zaXTDg8KDw4LCs3JpbyBtYW50w4PCg8OCwqltIG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBtYXMgbWFudMODwoPDgsKpbSBhY2Vzc28gaXJyZXN0cml0byBhb3MgbWV0YWRhZG9zIGUgdGV4dG8gY29tcGxldG8uIEFzc2ltLCBhIGFjZWl0YcODwoPDgsKnw4PCg8OCwqNvIGRlc3NlIHRlcm1vIG7Dg8KDw4LCo28gbmVjZXNzaXRhIGRlIGNvbnNlbnRpbWVudG8gcG9yIHBhcnRlIGRlIGF1dG9yZXMvZGV0ZW50b3JlcyBkb3MgZGlyZWl0b3MsIHBvciBlc3RhcmVtIGVtIGluaWNpYXRpdmFzIGRlIGFjZXNzbyBhYmVydG8uCg==Repositório InstitucionalPUBhttp://192.188.11.11:8080/oai/requestopendoar:19322022-02-21T03:10:21Repositório Institucional da UFBA - Universidade Federal da Bahia (UFBA)false
dc.title.pt_BR.fl_str_mv DptOIE: a portuguese Open Information Extraction system based on dependency analysis
title DptOIE: a portuguese Open Information Extraction system based on dependency analysis
spellingShingle DptOIE: a portuguese Open Information Extraction system based on dependency analysis
Oliveira, Leandro de
Open Information Extraction
Dependency analysis
Depth-first search
title_short DptOIE: a portuguese Open Information Extraction system based on dependency analysis
title_full DptOIE: a portuguese Open Information Extraction system based on dependency analysis
title_fullStr DptOIE: a portuguese Open Information Extraction system based on dependency analysis
title_full_unstemmed DptOIE: a portuguese Open Information Extraction system based on dependency analysis
title_sort DptOIE: a portuguese Open Information Extraction system based on dependency analysis
author Oliveira, Leandro de
author_facet Oliveira, Leandro de
Claro, Daniela Barreiro
author_role author
author2 Claro, Daniela Barreiro
author2_role author
dc.contributor.author.fl_str_mv Oliveira, Leandro de
Claro, Daniela Barreiro
dc.subject.por.fl_str_mv Open Information Extraction
Dependency analysis
Depth-first search
topic Open Information Extraction
Dependency analysis
Depth-first search
description Em fase de submissão a um periódico
publishDate 2019
dc.date.accessioned.fl_str_mv 2019-10-09T20:50:05Z
dc.date.available.fl_str_mv 2019-10-09T20:50:05Z
dc.date.issued.fl_str_mv 2019-10-09
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://repositorio.ufba.br/ri/handle/ri/30719
url http://repositorio.ufba.br/ri/handle/ri/30719
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFBA
instname:Universidade Federal da Bahia (UFBA)
instacron:UFBA
instname_str Universidade Federal da Bahia (UFBA)
instacron_str UFBA
institution UFBA
reponame_str Repositório Institucional da UFBA
collection Repositório Institucional da UFBA
bitstream.url.fl_str_mv https://repositorio.ufba.br/bitstream/ri/30719/1/DptOIE_Leandro_Linguamatica.pdf
https://repositorio.ufba.br/bitstream/ri/30719/2/license.txt
https://repositorio.ufba.br/bitstream/ri/30719/3/DptOIE_Leandro_Linguamatica.pdf.txt
bitstream.checksum.fl_str_mv bef14519f5d1d73c2985cab745f26079
907e2b7d511fb2c3e42dbdd41a6197c6
285dc9469fe519d5141cef9f240cf01d
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFBA - Universidade Federal da Bahia (UFBA)
repository.mail.fl_str_mv
_version_ 1808459600513466368