Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro

Soares, Paulo Henrique Maia

Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro

Detalhes bibliográficos
Autor(a) principal:	Soares, Paulo Henrique Maia
Data de Publicação:	2020
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFU
Texto Completo:	https://repositorio.ufu.br/handle/123456789/31882 https://doi.org/10.14393/ufu.di.2021.6003
Resumo:	With the growth in the amount of information, according to the governmental transparency available in recent years due to legislative requirements, access to information becomes increasingly difficult. Traditional search engines like Google, Yahoo and Bing return as desired information ordered by searching the document before the informed query. The area whose objective is to return relevant documents to the user is known as Information Retrieval which can be aided by machine learning algorithms to improve the ordering of documents, called in this context as Learning to Rank (L2R). There are several algorithms in the literature to solve L2R problems which each one seeks to solve the ranking problem in the best possible way. In the context of government documents, there is a possibility of identifying which are the main entities present in the most relevant documents relevant to a given query. This work aimed obtaining an ordering of the documents available on the Brazilian Government Data Portal using Learning to Rank and extracting information from entities from unstructured, semi-structured and tabular databases, which are common among the sources available on the Portal. To achieve this goal, used state-of-the-art techniques to recognize named entities and convex optimization models to model the L2R. The results obtained proved to be superior to the search engines available on the market (Google, Yahoo and Bing) since these index only the summary of data sets from the Data Portal.

Metadados do item

id	UFU_0e5b36f4067caa5915f6ae7ce0352742
oai_identifier_str	oai:repositorio.ufu.br:123456789/31882
network_acronym_str	UFU
network_name_str	Repositório Institucional da UFU
repository_id_str
spelling	Aprendizado de ranking de entidades aplicado aos dados do governo brasileiroEntity learning to rank applied to brazilian government dataAprendizado de rankingRecuperação de InformaçãoComputaçãoAprendizado de máquinaCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAORecuperação da informaçãoAprendizado do computadorTransparência na administração públicaWith the growth in the amount of information, according to the governmental transparency available in recent years due to legislative requirements, access to information becomes increasingly difficult. Traditional search engines like Google, Yahoo and Bing return as desired information ordered by searching the document before the informed query. The area whose objective is to return relevant documents to the user is known as Information Retrieval which can be aided by machine learning algorithms to improve the ordering of documents, called in this context as Learning to Rank (L2R). There are several algorithms in the literature to solve L2R problems which each one seeks to solve the ranking problem in the best possible way. In the context of government documents, there is a possibility of identifying which are the main entities present in the most relevant documents relevant to a given query. This work aimed obtaining an ordering of the documents available on the Brazilian Government Data Portal using Learning to Rank and extracting information from entities from unstructured, semi-structured and tabular databases, which are common among the sources available on the Portal. To achieve this goal, used state-of-the-art techniques to recognize named entities and convex optimization models to model the L2R. The results obtained proved to be superior to the search engines available on the market (Google, Yahoo and Bing) since these index only the summary of data sets from the Data Portal.Dissertação (Mestrado)Com o crescimento da quantidade de informações referente à transparência governamental disponíveis nos últimos anos devido as exigências legislativas, o acesso à informação desejada torna-se cada vez mais difícil. Buscadores tradicionais como Google, Yahoo e Bing retornam os documentos ordenados pela relevância perante a consulta informada. A área cujo objetivo é retornar os documentos relevantes é conhecida como Recuperação de Informação à qual pode ser auxiliada por algoritmos de aprendizado de máquina para melhorar a ordenação dos documentos, denominada nesse contexto como Aprendizado de Ranking. Existem na literatura diversos algoritmos para resolver problemas de Aprendizado de Ranking, onde cada um busca resolver o problema de ordenação com base em diferentes critérios. No contexto de documentos governamentais observa-se a possibilidade de identicar quais são as principais entidades presentes nos documentos mais relevantes retornados em uma determinada consulta. Essa dissertação visou obter uma ordenação dos documentos disponíveis no Porta de Dados do Governo Brasileiro utilizando Aprendizado de Ranking e extrair informação de entidades de bases de dados não-estruturadas, semi-estruturas e tabulares, que são comuns entre as fontes disponibilizadas no Portal. Para atingir tal objetivo recorreu-se às técnicas disponíveis no estado da arte para reconhecimento de entidade nomeadas e utilizou-se das técnicas de otimização convexa para modelar o processo de aprendizado de ranking. Os resultados obtidos demonstraram-se superiores aos buscadores disponíveis no mercado (Google, Yahoo e Bing) visto que esses indexam somente o resumo dos conjuntos de dados do Portal de Dados.Universidade Federal de UberlândiaBrasilPrograma de Pós-graduação em Ciência da ComputaçãoAlbertini, Marcelo Keesehttp://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4184508T7&tokenCaptchar=03AGdBq24f13mp-SyjtZylIv4dOnn0eQQtiJhdwSeIhMtkVmKZRTH-_x8yBX2iZ4E8ODz5zo_joX7LDgmlYoa0xHeu6qXHpMx17ESJf1JED__iR1jd5EunHo-TFn9i8-4NqBt9S0x_COtPG104yxHNd7fM4Zb3zGM1SDuD6cX2YsR713hkT7fPsWtNWNXcy3lrzylhx3VIPizzbX_zEIaQy2Eg9nBOw0K39PghGj2Y01G-9aYzbBuMvro8_LtsoTC25jsHUbmmPpp1VB_9xExjxfuJ3jjKGKSclOBxy7jGHSrCf-PfpZoLaYe0LO8EYtR7LqW5cLpVoAyxgSIeiN-ty1pedLzwPhIZh8noh1ZQQqdz85DvHKQHNDfmor0j1jwV3xP1O0yDmWPx4_LZVT3EwdL-LbVEuINw4F-GbsdR-DBQ58Aa0zgjzNa8atj-9b-rEMY3HDckFl0agH0HtWCM-09zQihc9vvv6QRios, Ricardo Araújohttp://lattes.cnpq.br/0427387583450747Maia, Marcelo de Almeidahttp://lattes.cnpq.br/4915659948263445Soares, Paulo Henrique Maia2021-06-07T17:32:22Z2021-06-07T17:32:22Z2020-12-03info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfSOARES, Paulo Henrique Maia. Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro. 2020. 71 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2020. DOI https://doi.org/10.14393/ufu.di.2021.6003.https://repositorio.ufu.br/handle/123456789/31882https://doi.org/10.14393/ufu.di.2021.6003porinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFUinstname:Universidade Federal de Uberlândia (UFU)instacron:UFU2021-06-08T06:20:58Zoai:repositorio.ufu.br:123456789/31882Repositório InstitucionalONGhttp://repositorio.ufu.br/oai/requestdiinf@dirbi.ufu.bropendoar:2021-06-08T06:20:58Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)false
dc.title.none.fl_str_mv	Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro Entity learning to rank applied to brazilian government data
title	Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro
spellingShingle	Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro Soares, Paulo Henrique Maia Aprendizado de ranking Recuperação de Informação Computação Aprendizado de máquina CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Recuperação da informação Aprendizado do computador Transparência na administração pública
title_short	Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro
title_full	Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro
title_fullStr	Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro
title_full_unstemmed	Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro
title_sort	Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro
author	Soares, Paulo Henrique Maia
author_facet	Soares, Paulo Henrique Maia
author_role	author
dc.contributor.none.fl_str_mv	Albertini, Marcelo Keese http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4184508T7&tokenCaptchar=03AGdBq24f13mp-SyjtZylIv4dOnn0eQQtiJhdwSeIhMtkVmKZRTH-_x8yBX2iZ4E8ODz5zo_joX7LDgmlYoa0xHeu6qXHpMx17ESJf1JED__iR1jd5EunHo-TFn9i8-4NqBt9S0x_COtPG104yxHNd7fM4Zb3zGM1SDuD6cX2YsR713hkT7fPsWtNWNXcy3lrzylhx3VIPizzbX_zEIaQy2Eg9nBOw0K39PghGj2Y01G-9aYzbBuMvro8_LtsoTC25jsHUbmmPpp1VB_9xExjxfuJ3jjKGKSclOBxy7jGHSrCf-PfpZoLaYe0LO8EYtR7LqW5cLpVoAyxgSIeiN-ty1pedLzwPhIZh8noh1ZQQqdz85DvHKQHNDfmor0j1jwV3xP1O0yDmWPx4_LZVT3EwdL-LbVEuINw4F-GbsdR-DBQ58Aa0zgjzNa8atj-9b-rEMY3HDckFl0agH0HtWCM-09zQihc9vvv6Q Rios, Ricardo Araújo http://lattes.cnpq.br/0427387583450747 Maia, Marcelo de Almeida http://lattes.cnpq.br/4915659948263445
dc.contributor.author.fl_str_mv	Soares, Paulo Henrique Maia
dc.subject.por.fl_str_mv	Aprendizado de ranking Recuperação de Informação Computação Aprendizado de máquina CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Recuperação da informação Aprendizado do computador Transparência na administração pública
topic	Aprendizado de ranking Recuperação de Informação Computação Aprendizado de máquina CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO Recuperação da informação Aprendizado do computador Transparência na administração pública
description	With the growth in the amount of information, according to the governmental transparency available in recent years due to legislative requirements, access to information becomes increasingly difficult. Traditional search engines like Google, Yahoo and Bing return as desired information ordered by searching the document before the informed query. The area whose objective is to return relevant documents to the user is known as Information Retrieval which can be aided by machine learning algorithms to improve the ordering of documents, called in this context as Learning to Rank (L2R). There are several algorithms in the literature to solve L2R problems which each one seeks to solve the ranking problem in the best possible way. In the context of government documents, there is a possibility of identifying which are the main entities present in the most relevant documents relevant to a given query. This work aimed obtaining an ordering of the documents available on the Brazilian Government Data Portal using Learning to Rank and extracting information from entities from unstructured, semi-structured and tabular databases, which are common among the sources available on the Portal. To achieve this goal, used state-of-the-art techniques to recognize named entities and convex optimization models to model the L2R. The results obtained proved to be superior to the search engines available on the market (Google, Yahoo and Bing) since these index only the summary of data sets from the Data Portal.
publishDate	2020
dc.date.none.fl_str_mv	2020-12-03 2021-06-07T17:32:22Z 2021-06-07T17:32:22Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	SOARES, Paulo Henrique Maia. Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro. 2020. 71 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2020. DOI https://doi.org/10.14393/ufu.di.2021.6003. https://repositorio.ufu.br/handle/123456789/31882 https://doi.org/10.14393/ufu.di.2021.6003
identifier_str_mv	SOARES, Paulo Henrique Maia. Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro. 2020. 71 f. Dissertação (Mestrado em Ciência da Computação) - Universidade Federal de Uberlândia, Uberlândia, 2020. DOI https://doi.org/10.14393/ufu.di.2021.6003.
url	https://repositorio.ufu.br/handle/123456789/31882 https://doi.org/10.14393/ufu.di.2021.6003
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Uberlândia Brasil Programa de Pós-graduação em Ciência da Computação
publisher.none.fl_str_mv	Universidade Federal de Uberlândia Brasil Programa de Pós-graduação em Ciência da Computação
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFU instname:Universidade Federal de Uberlândia (UFU) instacron:UFU
instname_str	Universidade Federal de Uberlândia (UFU)
instacron_str	UFU
institution	UFU
reponame_str	Repositório Institucional da UFU
collection	Repositório Institucional da UFU
repository.name.fl_str_mv	Repositório Institucional da UFU - Universidade Federal de Uberlândia (UFU)
repository.mail.fl_str_mv	diinf@dirbi.ufu.br
_version_	1823695066639630336

Aprendizado de ranking de entidades aplicado aos dados do governo brasileiro

Registros relacionados