Indexing and querying dataspaces

Mergen, Sérgio Luis Sardi

Indexing and querying dataspaces

Detalhes bibliográficos
Autor(a) principal:	Mergen, Sérgio Luis Sardi
Data de Publicação:	2011
Tipo de documento:	Tese
Idioma:	eng
Título da fonte:	Biblioteca Digital de Teses e Dissertações da UFRGS
Texto Completo:	http://hdl.handle.net/10183/31134
Resumo:	Over theWeb, distributed and heterogeneous sources with structured and related content form rich repositories of information commonly referred to as dataspaces. To provide access to this heterogeneous data, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the source schemas. On dataspaces, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. We propose a novel querying architecture that requires neither a mediated schema nor source mappings, which is based mainly on indexing mechanisms and on-the-fly rewriting algorithms. Our indexes are designed for data that is represented as relations, and are able to capture the structure of the sources, their instances and the connections between them. In the absence of a mediated schema, the user formulates structured queries based on what she expects to find. These queries are rewritten using a best-effort approach: the proposed rewriting algorithms compare a user query against the source schemas and produces a set of rewritings based on the matches found. Based on this architecture, two different querying approaches are tested. Experiments show that the indexing and rewriting algorithms are scalable, i.e., able to handle a very large number of structured Web sources; and that support simple, yet expressive queries that exploit the inherent structure of the data.

Metadados do item

id	URGS_c7609487e5681230386678be0581d313
oai_identifier_str	oai:www.lume.ufrgs.br:10183/31134
network_acronym_str	URGS
network_name_str	Biblioteca Digital de Teses e Dissertações da UFRGS
repository_id_str	1853
spelling	Mergen, Sérgio Luis SardiHeuser, Carlos Alberto2011-08-16T06:01:30Z2011http://hdl.handle.net/10183/31134000781807Over theWeb, distributed and heterogeneous sources with structured and related content form rich repositories of information commonly referred to as dataspaces. To provide access to this heterogeneous data, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the source schemas. On dataspaces, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. We propose a novel querying architecture that requires neither a mediated schema nor source mappings, which is based mainly on indexing mechanisms and on-the-fly rewriting algorithms. Our indexes are designed for data that is represented as relations, and are able to capture the structure of the sources, their instances and the connections between them. In the absence of a mediated schema, the user formulates structured queries based on what she expects to find. These queries are rewritten using a best-effort approach: the proposed rewriting algorithms compare a user query against the source schemas and produces a set of rewritings based on the matches found. Based on this architecture, two different querying approaches are tested. Experiments show that the indexing and rewriting algorithms are scalable, i.e., able to handle a very large number of structured Web sources; and that support simple, yet expressive queries that exploit the inherent structure of the data.application/pdfengRecuperacao : InformacaoBanco : DadosDataspacesData integrationSearch engineIndexingQuery rewritingIndexing and querying dataspacesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPrograma de Pós-Graduação em ComputaçãoPorto Alegre, BR-RS2011doutoradoinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT000781807.pdf.txt000781807.pdf.txtExtracted Texttext/plain293663http://www.lume.ufrgs.br/bitstream/10183/31134/2/000781807.pdf.txt9befe576e75eba0d22381d3cbd1c7dceMD52ORIGINAL000781807.pdf000781807.pdfTexto completo (inglês)application/pdf2700751http://www.lume.ufrgs.br/bitstream/10183/31134/1/000781807.pdf5109f8b05381abff050f986bc9de8696MD51THUMBNAIL000781807.pdf.jpg000781807.pdf.jpgGenerated Thumbnailimage/jpeg986http://www.lume.ufrgs.br/bitstream/10183/31134/3/000781807.pdf.jpg0ff28ed21da7f2957a92ad99ef7c8a85MD5310183/311342021-05-07 04:50:00.99511oai:www.lume.ufrgs.br:10183/31134Biblioteca Digital de Teses e Dissertaçõeshttps://lume.ufrgs.br/handle/10183/2PUBhttps://lume.ufrgs.br/oai/requestlume@ufrgs.br\|\|lume@ufrgs.bropendoar:18532021-05-07T07:50Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false
dc.title.pt_BR.fl_str_mv	Indexing and querying dataspaces
title	Indexing and querying dataspaces
spellingShingle	Indexing and querying dataspaces Mergen, Sérgio Luis Sardi Recuperacao : Informacao Banco : Dados Dataspaces Data integration Search engine Indexing Query rewriting
title_short	Indexing and querying dataspaces
title_full	Indexing and querying dataspaces
title_fullStr	Indexing and querying dataspaces
title_full_unstemmed	Indexing and querying dataspaces
title_sort	Indexing and querying dataspaces
author	Mergen, Sérgio Luis Sardi
author_facet	Mergen, Sérgio Luis Sardi
author_role	author
dc.contributor.author.fl_str_mv	Mergen, Sérgio Luis Sardi
dc.contributor.advisor1.fl_str_mv	Heuser, Carlos Alberto
contributor_str_mv	Heuser, Carlos Alberto
dc.subject.por.fl_str_mv	Recuperacao : Informacao Banco : Dados
topic	Recuperacao : Informacao Banco : Dados Dataspaces Data integration Search engine Indexing Query rewriting
dc.subject.eng.fl_str_mv	Dataspaces Data integration Search engine Indexing Query rewriting
description	Over theWeb, distributed and heterogeneous sources with structured and related content form rich repositories of information commonly referred to as dataspaces. To provide access to this heterogeneous data, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the source schemas. On dataspaces, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. We propose a novel querying architecture that requires neither a mediated schema nor source mappings, which is based mainly on indexing mechanisms and on-the-fly rewriting algorithms. Our indexes are designed for data that is represented as relations, and are able to capture the structure of the sources, their instances and the connections between them. In the absence of a mediated schema, the user formulates structured queries based on what she expects to find. These queries are rewritten using a best-effort approach: the proposed rewriting algorithms compare a user query against the source schemas and produces a set of rewritings based on the matches found. Based on this architecture, two different querying approaches are tested. Experiments show that the indexing and rewriting algorithms are scalable, i.e., able to handle a very large number of structured Web sources; and that support simple, yet expressive queries that exploit the inherent structure of the data.
publishDate	2011
dc.date.accessioned.fl_str_mv	2011-08-16T06:01:30Z
dc.date.issued.fl_str_mv	2011
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10183/31134
dc.identifier.nrb.pt_BR.fl_str_mv	000781807
url	http://hdl.handle.net/10183/31134
identifier_str_mv	000781807
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS
instname_str	Universidade Federal do Rio Grande do Sul (UFRGS)
instacron_str	UFRGS
institution	UFRGS
reponame_str	Biblioteca Digital de Teses e Dissertações da UFRGS
collection	Biblioteca Digital de Teses e Dissertações da UFRGS
bitstream.url.fl_str_mv	http://www.lume.ufrgs.br/bitstream/10183/31134/2/000781807.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/31134/1/000781807.pdf http://www.lume.ufrgs.br/bitstream/10183/31134/3/000781807.pdf.jpg
bitstream.checksum.fl_str_mv	9befe576e75eba0d22381d3cbd1c7dce 5109f8b05381abff050f986bc9de8696 0ff28ed21da7f2957a92ad99ef7c8a85
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)
repository.mail.fl_str_mv	lume@ufrgs.br\|\|lume@ufrgs.br
_version_	1810085206354296832

Indexing and querying dataspaces

Registros relacionados