Logical big data integration and near real-time data analytics

Detalhes bibliográficos
Autor(a) principal: Silva, Bruno
Data de Publicação: 2023
Outros Autores: Moreira, José, de C. Costa, Rogério Luís
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/37907
Resumo: In the context of decision-making, there is a growing demand for near real-time data that traditional solutions, like data warehousing based on long-running ETL processes, cannot fully meet. On the other hand, existing logical data integration solutions are challenging because users must focus on data location and distribution details rather than on data analytics and decision-making. EasyBDI is an open-source system that provides logical integration of data and high-level business-oriented abstractions. It uses schema matching, integration, and mapping techniques, to automatically identify partitioned data and propose a global schema. Users can then specify star schemas based on global entities and submit analytical queries to retrieve data from distributed data sources without knowing the organization and other technical details of the underlying systems. This work presents the algorithms and methods for global schema creation and query execution. Experimental results show that the overhead imposed by logical integration layers is relatively small compared to the execution times of distributed queries.
id RCAP_27c9d3cadb5999a231af21b1f6b44823
oai_identifier_str oai:ria.ua.pt:10773/37907
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Logical big data integration and near real-time data analyticsBig data integrationDistributed databasesNear real-time OLAPIn the context of decision-making, there is a growing demand for near real-time data that traditional solutions, like data warehousing based on long-running ETL processes, cannot fully meet. On the other hand, existing logical data integration solutions are challenging because users must focus on data location and distribution details rather than on data analytics and decision-making. EasyBDI is an open-source system that provides logical integration of data and high-level business-oriented abstractions. It uses schema matching, integration, and mapping techniques, to automatically identify partitioned data and propose a global schema. Users can then specify star schemas based on global entities and submit analytical queries to retrieve data from distributed data sources without knowing the organization and other technical details of the underlying systems. This work presents the algorithms and methods for global schema creation and query execution. Experimental results show that the overhead imposed by logical integration layers is relatively small compared to the execution times of distributed queries.Elsevier2025-05-12T00:00:00Z2023-05-12T00:00:00Z2023-05-12info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10773/37907eng0169-023X10.1016/j.datak.2023.102185Silva, BrunoMoreira, Joséde C. Costa, Rogério Luísinfo:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:14:00Zoai:ria.ua.pt:10773/37907Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:08:27.401425Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Logical big data integration and near real-time data analytics
title Logical big data integration and near real-time data analytics
spellingShingle Logical big data integration and near real-time data analytics
Silva, Bruno
Big data integration
Distributed databases
Near real-time OLAP
title_short Logical big data integration and near real-time data analytics
title_full Logical big data integration and near real-time data analytics
title_fullStr Logical big data integration and near real-time data analytics
title_full_unstemmed Logical big data integration and near real-time data analytics
title_sort Logical big data integration and near real-time data analytics
author Silva, Bruno
author_facet Silva, Bruno
Moreira, José
de C. Costa, Rogério Luís
author_role author
author2 Moreira, José
de C. Costa, Rogério Luís
author2_role author
author
dc.contributor.author.fl_str_mv Silva, Bruno
Moreira, José
de C. Costa, Rogério Luís
dc.subject.por.fl_str_mv Big data integration
Distributed databases
Near real-time OLAP
topic Big data integration
Distributed databases
Near real-time OLAP
description In the context of decision-making, there is a growing demand for near real-time data that traditional solutions, like data warehousing based on long-running ETL processes, cannot fully meet. On the other hand, existing logical data integration solutions are challenging because users must focus on data location and distribution details rather than on data analytics and decision-making. EasyBDI is an open-source system that provides logical integration of data and high-level business-oriented abstractions. It uses schema matching, integration, and mapping techniques, to automatically identify partitioned data and propose a global schema. Users can then specify star schemas based on global entities and submit analytical queries to retrieve data from distributed data sources without knowing the organization and other technical details of the underlying systems. This work presents the algorithms and methods for global schema creation and query execution. Experimental results show that the overhead imposed by logical integration layers is relatively small compared to the execution times of distributed queries.
publishDate 2023
dc.date.none.fl_str_mv 2023-05-12T00:00:00Z
2023-05-12
2025-05-12T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/37907
url http://hdl.handle.net/10773/37907
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 0169-023X
10.1016/j.datak.2023.102185
dc.rights.driver.fl_str_mv info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv embargoedAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137737105735680