DICOOGLE: No-SQL for supporting Big Data environments
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10773/17218 |
Resumo: | The last few years have been characterized by a proliferation of different types of medical imaging modalities in healthcare institutions. As a result, the services are migrating to infrastructures in the Cloud. Thus, in addition to a scenario where tremendous amounts of data are produced, we walked to a reality where processes are increasingly distributed. Consequently, this reality has created new technological challenges regarding storage, management and handling of this data, in order to guarantee high availability and performance of the information systems, dealing with the images. An Open Source Picture Archive and Communication System (PACS) has been developed by the bioinformatics research group at the University of Aveiro labeled Dicoogle. This system replaced the traditional relational database engine for an agile mechanism, which indexes and retrieves data. Thus it is possible to extract, index and store all the image’s metadata, including any private information, without re-engineering or reconfiguration process. Among other use cases, this system has already indexed more than 22 million images in 3 hospitals from the region of Aveiro. Currently, Dicoogle provides a solution based on the Apache Lucene library. However, it has performance issues in environments where we need to handle and search over large amounts of data, more particularly in data analytics scenarios. In the context of this work, different technologies capable of supporting a database of an image repository were studied. In sequence, four solutions were fully implemented based on relational databases, NoSQL and two distinct text engines. A test platform was also developed to evaluate the performance and scalability of these solutions, which allowed a comparative analysis of them. In the end, it is proposed a hybrid architecture of medical image database, which was implemented and validated. This proposal has demonstrated significant gains in terms of query, index time and in scenarios where it is required a wide data analyze. |
id |
RCAP_5b8fecd02ecfee2f3d66e948dde6b5b1 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/17218 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
DICOOGLE: No-SQL for supporting Big Data environmentsEngenharia de computadores e telemáticaBioinformáticaSistemas de informação médicaDiagnóstico por imagemArmazenamento de dados - ImagensRecuperação da informaçãoBases de dados relacionaisThe last few years have been characterized by a proliferation of different types of medical imaging modalities in healthcare institutions. As a result, the services are migrating to infrastructures in the Cloud. Thus, in addition to a scenario where tremendous amounts of data are produced, we walked to a reality where processes are increasingly distributed. Consequently, this reality has created new technological challenges regarding storage, management and handling of this data, in order to guarantee high availability and performance of the information systems, dealing with the images. An Open Source Picture Archive and Communication System (PACS) has been developed by the bioinformatics research group at the University of Aveiro labeled Dicoogle. This system replaced the traditional relational database engine for an agile mechanism, which indexes and retrieves data. Thus it is possible to extract, index and store all the image’s metadata, including any private information, without re-engineering or reconfiguration process. Among other use cases, this system has already indexed more than 22 million images in 3 hospitals from the region of Aveiro. Currently, Dicoogle provides a solution based on the Apache Lucene library. However, it has performance issues in environments where we need to handle and search over large amounts of data, more particularly in data analytics scenarios. In the context of this work, different technologies capable of supporting a database of an image repository were studied. In sequence, four solutions were fully implemented based on relational databases, NoSQL and two distinct text engines. A test platform was also developed to evaluate the performance and scalability of these solutions, which allowed a comparative analysis of them. In the end, it is proposed a hybrid architecture of medical image database, which was implemented and validated. This proposal has demonstrated significant gains in terms of query, index time and in scenarios where it is required a wide data analyze.Os últimos anos têm sido caracterizados por uma proliferação de diversos tipos de modalidades de imagem médica nas instituições de saúde. Por outro lado, assistimos a uma migração de serviços para infraestruturas na Cloud. Assim, para além de um cenário onde são produzidos tremendos volumes de dados, caminhamos para uma realidade em que os processos são cada vez mais distribuídos. Tal realidade tem colocado novos desafios tecnológicos ao nível do arquivo, transmissão e visualização, muito particularmente nos aspetos de desempenho e escalabilidade dos sistemas de informação que lidam com a imagem. O grupo de bioinformática da universidade de Aveiro tem vindo a desenvolver um inovador sistema distribuído de arquivo de imagem médica, o Dicoogle Open Source PACS. Este sistema substituiu o tradicional motor de base de dados relacional por um mecanismo ágil de indexação e recuperação de dados. Desta forma é possível extrair, indexar e armazenar todos os metadados das imagens, incluindo eventuais elementos privados, sem necessidade de processos de reengenharia ou reconfiguração. Entre outros casos de uso, este sistema já indexou mais de 22 milhões de imagens em 3 hospitais da região de Aveiro. Atualmente, o Dicoogle dispõe de uma solução baseada na biblioteca Apache Lucene. No entanto, esta tem demonstrado alguns problemas de desempenho em ambientes em que temos necessidade de manusear e pesquisar sobre uma grande quantidade de dados, muito particularmente em cenários de análise de dados. No âmbito desta dissertação foram estudadas diferentes tecnologias capazes de suportar uma base dados de um repositório de imagem. Em sequência, foram implementadas quatro soluções baseadas em bases de dados relacionais, NoSQL e motor de indexação. Foi também desenvolvida uma plataforma de testes de desempenho e escalabilidade que permitiu efetuar uma análise comparativa das soluções implementadas. No final, é proposta uma arquitetura híbrida de base de dados de imagem médica que foi implementada e validada. Tal proposta demonstrou ter ganhos significativos ao nível dos tempos de pesquisa de conteúdos e em cenários de análise alargada de dados.Universidade de Aveiro2017-04-07T10:56:43Z2016-01-01T00:00:00Z2016info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/17218TID:201565480engAlves, André Filipe Pereirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:32:57Zoai:ria.ua.pt:10773/17218Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:52:26.002422Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
DICOOGLE: No-SQL for supporting Big Data environments |
title |
DICOOGLE: No-SQL for supporting Big Data environments |
spellingShingle |
DICOOGLE: No-SQL for supporting Big Data environments Alves, André Filipe Pereira Engenharia de computadores e telemática Bioinformática Sistemas de informação médica Diagnóstico por imagem Armazenamento de dados - Imagens Recuperação da informação Bases de dados relacionais |
title_short |
DICOOGLE: No-SQL for supporting Big Data environments |
title_full |
DICOOGLE: No-SQL for supporting Big Data environments |
title_fullStr |
DICOOGLE: No-SQL for supporting Big Data environments |
title_full_unstemmed |
DICOOGLE: No-SQL for supporting Big Data environments |
title_sort |
DICOOGLE: No-SQL for supporting Big Data environments |
author |
Alves, André Filipe Pereira |
author_facet |
Alves, André Filipe Pereira |
author_role |
author |
dc.contributor.author.fl_str_mv |
Alves, André Filipe Pereira |
dc.subject.por.fl_str_mv |
Engenharia de computadores e telemática Bioinformática Sistemas de informação médica Diagnóstico por imagem Armazenamento de dados - Imagens Recuperação da informação Bases de dados relacionais |
topic |
Engenharia de computadores e telemática Bioinformática Sistemas de informação médica Diagnóstico por imagem Armazenamento de dados - Imagens Recuperação da informação Bases de dados relacionais |
description |
The last few years have been characterized by a proliferation of different types of medical imaging modalities in healthcare institutions. As a result, the services are migrating to infrastructures in the Cloud. Thus, in addition to a scenario where tremendous amounts of data are produced, we walked to a reality where processes are increasingly distributed. Consequently, this reality has created new technological challenges regarding storage, management and handling of this data, in order to guarantee high availability and performance of the information systems, dealing with the images. An Open Source Picture Archive and Communication System (PACS) has been developed by the bioinformatics research group at the University of Aveiro labeled Dicoogle. This system replaced the traditional relational database engine for an agile mechanism, which indexes and retrieves data. Thus it is possible to extract, index and store all the image’s metadata, including any private information, without re-engineering or reconfiguration process. Among other use cases, this system has already indexed more than 22 million images in 3 hospitals from the region of Aveiro. Currently, Dicoogle provides a solution based on the Apache Lucene library. However, it has performance issues in environments where we need to handle and search over large amounts of data, more particularly in data analytics scenarios. In the context of this work, different technologies capable of supporting a database of an image repository were studied. In sequence, four solutions were fully implemented based on relational databases, NoSQL and two distinct text engines. A test platform was also developed to evaluate the performance and scalability of these solutions, which allowed a comparative analysis of them. In the end, it is proposed a hybrid architecture of medical image database, which was implemented and validated. This proposal has demonstrated significant gains in terms of query, index time and in scenarios where it is required a wide data analyze. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-01-01T00:00:00Z 2016 2017-04-07T10:56:43Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/17218 TID:201565480 |
url |
http://hdl.handle.net/10773/17218 |
identifier_str_mv |
TID:201565480 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade de Aveiro |
publisher.none.fl_str_mv |
Universidade de Aveiro |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137574503055360 |