Indexing and querying dataspaces
Autor(a) principal: | |
---|---|
Data de Publicação: | 2011 |
Tipo de documento: | Tese |
Idioma: | eng |
Título da fonte: | Biblioteca Digital de Teses e Dissertações da UFRGS |
Texto Completo: | http://hdl.handle.net/10183/31134 |
Resumo: | Over theWeb, distributed and heterogeneous sources with structured and related content form rich repositories of information commonly referred to as dataspaces. To provide access to this heterogeneous data, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the source schemas. On dataspaces, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. We propose a novel querying architecture that requires neither a mediated schema nor source mappings, which is based mainly on indexing mechanisms and on-the-fly rewriting algorithms. Our indexes are designed for data that is represented as relations, and are able to capture the structure of the sources, their instances and the connections between them. In the absence of a mediated schema, the user formulates structured queries based on what she expects to find. These queries are rewritten using a best-effort approach: the proposed rewriting algorithms compare a user query against the source schemas and produces a set of rewritings based on the matches found. Based on this architecture, two different querying approaches are tested. Experiments show that the indexing and rewriting algorithms are scalable, i.e., able to handle a very large number of structured Web sources; and that support simple, yet expressive queries that exploit the inherent structure of the data. |
id |
URGS_c7609487e5681230386678be0581d313 |
---|---|
oai_identifier_str |
oai:www.lume.ufrgs.br:10183/31134 |
network_acronym_str |
URGS |
network_name_str |
Biblioteca Digital de Teses e Dissertações da UFRGS |
repository_id_str |
1853 |
spelling |
Mergen, Sérgio Luis SardiHeuser, Carlos Alberto2011-08-16T06:01:30Z2011http://hdl.handle.net/10183/31134000781807Over theWeb, distributed and heterogeneous sources with structured and related content form rich repositories of information commonly referred to as dataspaces. To provide access to this heterogeneous data, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the source schemas. On dataspaces, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. We propose a novel querying architecture that requires neither a mediated schema nor source mappings, which is based mainly on indexing mechanisms and on-the-fly rewriting algorithms. Our indexes are designed for data that is represented as relations, and are able to capture the structure of the sources, their instances and the connections between them. In the absence of a mediated schema, the user formulates structured queries based on what she expects to find. These queries are rewritten using a best-effort approach: the proposed rewriting algorithms compare a user query against the source schemas and produces a set of rewritings based on the matches found. Based on this architecture, two different querying approaches are tested. Experiments show that the indexing and rewriting algorithms are scalable, i.e., able to handle a very large number of structured Web sources; and that support simple, yet expressive queries that exploit the inherent structure of the data.application/pdfengRecuperacao : InformacaoBanco : DadosDataspacesData integrationSearch engineIndexingQuery rewritingIndexing and querying dataspacesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisUniversidade Federal do Rio Grande do SulInstituto de InformáticaPrograma de Pós-Graduação em ComputaçãoPorto Alegre, BR-RS2011doutoradoinfo:eu-repo/semantics/openAccessreponame:Biblioteca Digital de Teses e Dissertações da UFRGSinstname:Universidade Federal do Rio Grande do Sul (UFRGS)instacron:UFRGSTEXT000781807.pdf.txt000781807.pdf.txtExtracted Texttext/plain293663http://www.lume.ufrgs.br/bitstream/10183/31134/2/000781807.pdf.txt9befe576e75eba0d22381d3cbd1c7dceMD52ORIGINAL000781807.pdf000781807.pdfTexto completo (inglês)application/pdf2700751http://www.lume.ufrgs.br/bitstream/10183/31134/1/000781807.pdf5109f8b05381abff050f986bc9de8696MD51THUMBNAIL000781807.pdf.jpg000781807.pdf.jpgGenerated Thumbnailimage/jpeg986http://www.lume.ufrgs.br/bitstream/10183/31134/3/000781807.pdf.jpg0ff28ed21da7f2957a92ad99ef7c8a85MD5310183/311342021-05-07 04:50:00.99511oai:www.lume.ufrgs.br:10183/31134Biblioteca Digital de Teses e Dissertaçõeshttps://lume.ufrgs.br/handle/10183/2PUBhttps://lume.ufrgs.br/oai/requestlume@ufrgs.br||lume@ufrgs.bropendoar:18532021-05-07T07:50Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS)false |
dc.title.pt_BR.fl_str_mv |
Indexing and querying dataspaces |
title |
Indexing and querying dataspaces |
spellingShingle |
Indexing and querying dataspaces Mergen, Sérgio Luis Sardi Recuperacao : Informacao Banco : Dados Dataspaces Data integration Search engine Indexing Query rewriting |
title_short |
Indexing and querying dataspaces |
title_full |
Indexing and querying dataspaces |
title_fullStr |
Indexing and querying dataspaces |
title_full_unstemmed |
Indexing and querying dataspaces |
title_sort |
Indexing and querying dataspaces |
author |
Mergen, Sérgio Luis Sardi |
author_facet |
Mergen, Sérgio Luis Sardi |
author_role |
author |
dc.contributor.author.fl_str_mv |
Mergen, Sérgio Luis Sardi |
dc.contributor.advisor1.fl_str_mv |
Heuser, Carlos Alberto |
contributor_str_mv |
Heuser, Carlos Alberto |
dc.subject.por.fl_str_mv |
Recuperacao : Informacao Banco : Dados |
topic |
Recuperacao : Informacao Banco : Dados Dataspaces Data integration Search engine Indexing Query rewriting |
dc.subject.eng.fl_str_mv |
Dataspaces Data integration Search engine Indexing Query rewriting |
description |
Over theWeb, distributed and heterogeneous sources with structured and related content form rich repositories of information commonly referred to as dataspaces. To provide access to this heterogeneous data, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the source schemas. On dataspaces, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. We propose a novel querying architecture that requires neither a mediated schema nor source mappings, which is based mainly on indexing mechanisms and on-the-fly rewriting algorithms. Our indexes are designed for data that is represented as relations, and are able to capture the structure of the sources, their instances and the connections between them. In the absence of a mediated schema, the user formulates structured queries based on what she expects to find. These queries are rewritten using a best-effort approach: the proposed rewriting algorithms compare a user query against the source schemas and produces a set of rewritings based on the matches found. Based on this architecture, two different querying approaches are tested. Experiments show that the indexing and rewriting algorithms are scalable, i.e., able to handle a very large number of structured Web sources; and that support simple, yet expressive queries that exploit the inherent structure of the data. |
publishDate |
2011 |
dc.date.accessioned.fl_str_mv |
2011-08-16T06:01:30Z |
dc.date.issued.fl_str_mv |
2011 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
format |
doctoralThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10183/31134 |
dc.identifier.nrb.pt_BR.fl_str_mv |
000781807 |
url |
http://hdl.handle.net/10183/31134 |
identifier_str_mv |
000781807 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da UFRGS instname:Universidade Federal do Rio Grande do Sul (UFRGS) instacron:UFRGS |
instname_str |
Universidade Federal do Rio Grande do Sul (UFRGS) |
instacron_str |
UFRGS |
institution |
UFRGS |
reponame_str |
Biblioteca Digital de Teses e Dissertações da UFRGS |
collection |
Biblioteca Digital de Teses e Dissertações da UFRGS |
bitstream.url.fl_str_mv |
http://www.lume.ufrgs.br/bitstream/10183/31134/2/000781807.pdf.txt http://www.lume.ufrgs.br/bitstream/10183/31134/1/000781807.pdf http://www.lume.ufrgs.br/bitstream/10183/31134/3/000781807.pdf.jpg |
bitstream.checksum.fl_str_mv |
9befe576e75eba0d22381d3cbd1c7dce 5109f8b05381abff050f986bc9de8696 0ff28ed21da7f2957a92ad99ef7c8a85 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da UFRGS - Universidade Federal do Rio Grande do Sul (UFRGS) |
repository.mail.fl_str_mv |
lume@ufrgs.br||lume@ufrgs.br |
_version_ |
1810085206354296832 |