DICOOGLE: No-SQL for supporting Big Data environments

Detalhes bibliográficos
Autor(a) principal: Alves, André Filipe Pereira
Data de Publicação: 2016
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/17218
Resumo: The last few years have been characterized by a proliferation of different types of medical imaging modalities in healthcare institutions. As a result, the services are migrating to infrastructures in the Cloud. Thus, in addition to a scenario where tremendous amounts of data are produced, we walked to a reality where processes are increasingly distributed. Consequently, this reality has created new technological challenges regarding storage, management and handling of this data, in order to guarantee high availability and performance of the information systems, dealing with the images. An Open Source Picture Archive and Communication System (PACS) has been developed by the bioinformatics research group at the University of Aveiro labeled Dicoogle. This system replaced the traditional relational database engine for an agile mechanism, which indexes and retrieves data. Thus it is possible to extract, index and store all the image’s metadata, including any private information, without re-engineering or reconfiguration process. Among other use cases, this system has already indexed more than 22 million images in 3 hospitals from the region of Aveiro. Currently, Dicoogle provides a solution based on the Apache Lucene library. However, it has performance issues in environments where we need to handle and search over large amounts of data, more particularly in data analytics scenarios. In the context of this work, different technologies capable of supporting a database of an image repository were studied. In sequence, four solutions were fully implemented based on relational databases, NoSQL and two distinct text engines. A test platform was also developed to evaluate the performance and scalability of these solutions, which allowed a comparative analysis of them. In the end, it is proposed a hybrid architecture of medical image database, which was implemented and validated. This proposal has demonstrated significant gains in terms of query, index time and in scenarios where it is required a wide data analyze.
id RCAP_5b8fecd02ecfee2f3d66e948dde6b5b1
oai_identifier_str oai:ria.ua.pt:10773/17218
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling DICOOGLE: No-SQL for supporting Big Data environmentsEngenharia de computadores e telemáticaBioinformáticaSistemas de informação médicaDiagnóstico por imagemArmazenamento de dados - ImagensRecuperação da informaçãoBases de dados relacionaisThe last few years have been characterized by a proliferation of different types of medical imaging modalities in healthcare institutions. As a result, the services are migrating to infrastructures in the Cloud. Thus, in addition to a scenario where tremendous amounts of data are produced, we walked to a reality where processes are increasingly distributed. Consequently, this reality has created new technological challenges regarding storage, management and handling of this data, in order to guarantee high availability and performance of the information systems, dealing with the images. An Open Source Picture Archive and Communication System (PACS) has been developed by the bioinformatics research group at the University of Aveiro labeled Dicoogle. This system replaced the traditional relational database engine for an agile mechanism, which indexes and retrieves data. Thus it is possible to extract, index and store all the image’s metadata, including any private information, without re-engineering or reconfiguration process. Among other use cases, this system has already indexed more than 22 million images in 3 hospitals from the region of Aveiro. Currently, Dicoogle provides a solution based on the Apache Lucene library. However, it has performance issues in environments where we need to handle and search over large amounts of data, more particularly in data analytics scenarios. In the context of this work, different technologies capable of supporting a database of an image repository were studied. In sequence, four solutions were fully implemented based on relational databases, NoSQL and two distinct text engines. A test platform was also developed to evaluate the performance and scalability of these solutions, which allowed a comparative analysis of them. In the end, it is proposed a hybrid architecture of medical image database, which was implemented and validated. This proposal has demonstrated significant gains in terms of query, index time and in scenarios where it is required a wide data analyze.Os últimos anos têm sido caracterizados por uma proliferação de diversos tipos de modalidades de imagem médica nas instituições de saúde. Por outro lado, assistimos a uma migração de serviços para infraestruturas na Cloud. Assim, para além de um cenário onde são produzidos tremendos volumes de dados, caminhamos para uma realidade em que os processos são cada vez mais distribuídos. Tal realidade tem colocado novos desafios tecnológicos ao nível do arquivo, transmissão e visualização, muito particularmente nos aspetos de desempenho e escalabilidade dos sistemas de informação que lidam com a imagem. O grupo de bioinformática da universidade de Aveiro tem vindo a desenvolver um inovador sistema distribuído de arquivo de imagem médica, o Dicoogle Open Source PACS. Este sistema substituiu o tradicional motor de base de dados relacional por um mecanismo ágil de indexação e recuperação de dados. Desta forma é possível extrair, indexar e armazenar todos os metadados das imagens, incluindo eventuais elementos privados, sem necessidade de processos de reengenharia ou reconfiguração. Entre outros casos de uso, este sistema já indexou mais de 22 milhões de imagens em 3 hospitais da região de Aveiro. Atualmente, o Dicoogle dispõe de uma solução baseada na biblioteca Apache Lucene. No entanto, esta tem demonstrado alguns problemas de desempenho em ambientes em que temos necessidade de manusear e pesquisar sobre uma grande quantidade de dados, muito particularmente em cenários de análise de dados. No âmbito desta dissertação foram estudadas diferentes tecnologias capazes de suportar uma base dados de um repositório de imagem. Em sequência, foram implementadas quatro soluções baseadas em bases de dados relacionais, NoSQL e motor de indexação. Foi também desenvolvida uma plataforma de testes de desempenho e escalabilidade que permitiu efetuar uma análise comparativa das soluções implementadas. No final, é proposta uma arquitetura híbrida de base de dados de imagem médica que foi implementada e validada. Tal proposta demonstrou ter ganhos significativos ao nível dos tempos de pesquisa de conteúdos e em cenários de análise alargada de dados.Universidade de Aveiro2017-04-07T10:56:43Z2016-01-01T00:00:00Z2016info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/17218TID:201565480engAlves, André Filipe Pereirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T11:32:57Zoai:ria.ua.pt:10773/17218Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T02:52:26.002422Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv DICOOGLE: No-SQL for supporting Big Data environments
title DICOOGLE: No-SQL for supporting Big Data environments
spellingShingle DICOOGLE: No-SQL for supporting Big Data environments
Alves, André Filipe Pereira
Engenharia de computadores e telemática
Bioinformática
Sistemas de informação médica
Diagnóstico por imagem
Armazenamento de dados - Imagens
Recuperação da informação
Bases de dados relacionais
title_short DICOOGLE: No-SQL for supporting Big Data environments
title_full DICOOGLE: No-SQL for supporting Big Data environments
title_fullStr DICOOGLE: No-SQL for supporting Big Data environments
title_full_unstemmed DICOOGLE: No-SQL for supporting Big Data environments
title_sort DICOOGLE: No-SQL for supporting Big Data environments
author Alves, André Filipe Pereira
author_facet Alves, André Filipe Pereira
author_role author
dc.contributor.author.fl_str_mv Alves, André Filipe Pereira
dc.subject.por.fl_str_mv Engenharia de computadores e telemática
Bioinformática
Sistemas de informação médica
Diagnóstico por imagem
Armazenamento de dados - Imagens
Recuperação da informação
Bases de dados relacionais
topic Engenharia de computadores e telemática
Bioinformática
Sistemas de informação médica
Diagnóstico por imagem
Armazenamento de dados - Imagens
Recuperação da informação
Bases de dados relacionais
description The last few years have been characterized by a proliferation of different types of medical imaging modalities in healthcare institutions. As a result, the services are migrating to infrastructures in the Cloud. Thus, in addition to a scenario where tremendous amounts of data are produced, we walked to a reality where processes are increasingly distributed. Consequently, this reality has created new technological challenges regarding storage, management and handling of this data, in order to guarantee high availability and performance of the information systems, dealing with the images. An Open Source Picture Archive and Communication System (PACS) has been developed by the bioinformatics research group at the University of Aveiro labeled Dicoogle. This system replaced the traditional relational database engine for an agile mechanism, which indexes and retrieves data. Thus it is possible to extract, index and store all the image’s metadata, including any private information, without re-engineering or reconfiguration process. Among other use cases, this system has already indexed more than 22 million images in 3 hospitals from the region of Aveiro. Currently, Dicoogle provides a solution based on the Apache Lucene library. However, it has performance issues in environments where we need to handle and search over large amounts of data, more particularly in data analytics scenarios. In the context of this work, different technologies capable of supporting a database of an image repository were studied. In sequence, four solutions were fully implemented based on relational databases, NoSQL and two distinct text engines. A test platform was also developed to evaluate the performance and scalability of these solutions, which allowed a comparative analysis of them. In the end, it is proposed a hybrid architecture of medical image database, which was implemented and validated. This proposal has demonstrated significant gains in terms of query, index time and in scenarios where it is required a wide data analyze.
publishDate 2016
dc.date.none.fl_str_mv 2016-01-01T00:00:00Z
2016
2017-04-07T10:56:43Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/17218
TID:201565480
url http://hdl.handle.net/10773/17218
identifier_str_mv TID:201565480
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade de Aveiro
publisher.none.fl_str_mv Universidade de Aveiro
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137574503055360