Graph databases for HR relationships

Rafael Araújo Moura

Graph databases for HR relationships

Detalhes bibliográficos
Autor(a) principal:	Rafael Araújo Moura
Data de Publicação:	2021
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://hdl.handle.net/10216/137426
Resumo:	Human Resources data modeling makes heavy use of graph like structure to hold objects (positions, contracts, jobs, cost centers, org units, etc.) and their relationships (an org unit reports to another org unit, a position belongs to an org unit and reports to another position, etc.). Traditionally, Human Resources' databases are relational, and querying graph data stored on a relational database is highly inefficient. The main objective of this dissertation is the comparison, in terms of performance and flexibility, between a relational and a graph database, when both have highly-connected data as it is the case with Human Resources' data. It was then required to model a database based on a real scenario of a company who deals these types of data on a daily basis. Queries were also formulated and they enabled the interaction with all stored entities and relationships between them while allowing the retrieval of performance results for both databases. Written queries were also compared, in terms of readability and expressability, with the objetive of determining which case is easier to understand what is being queried and how much simpler it is to formulate such query. When executing queries that traverse almost completely a data structure, the graph database performed much better than a relational database, even getting execution time values 200 times smaller than those obtained in SQL. As for hierarchical queries, only when we increased the amount of manager relationships per employee, by a factor of ten, were we able to see that Neo4j performed up to 3.5 times faster than SQL. These results regard databases with a maximum of one million employees and it is safe to believe that the differences in performance would grow larger for even bigger and more connected databases. Furthermore, it was concluded that the database modeling in graphs is more intuitive and immediate, and as for queries, its formulation is faster, simpler and more readable in the case of Cypher language when in contrast to its writing in SQL. This is due to the fact that the analysed Cypher queries had half the lines of their SQL query equivalents while also making use of ASCII characters for node and relationship representation when pattern matching. More concise queries result in a lower probability of errors occurring while also making it easier for new developers to catch up, understand and work with previously written queries.

Metadados do item

id	RCAP_48d5db0687b96b22dcfe7b585ff81fef
oai_identifier_str	oai:repositorio-aberto.up.pt:10216/137426
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Graph databases for HR relationshipsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringHuman Resources data modeling makes heavy use of graph like structure to hold objects (positions, contracts, jobs, cost centers, org units, etc.) and their relationships (an org unit reports to another org unit, a position belongs to an org unit and reports to another position, etc.). Traditionally, Human Resources' databases are relational, and querying graph data stored on a relational database is highly inefficient. The main objective of this dissertation is the comparison, in terms of performance and flexibility, between a relational and a graph database, when both have highly-connected data as it is the case with Human Resources' data. It was then required to model a database based on a real scenario of a company who deals these types of data on a daily basis. Queries were also formulated and they enabled the interaction with all stored entities and relationships between them while allowing the retrieval of performance results for both databases. Written queries were also compared, in terms of readability and expressability, with the objetive of determining which case is easier to understand what is being queried and how much simpler it is to formulate such query. When executing queries that traverse almost completely a data structure, the graph database performed much better than a relational database, even getting execution time values 200 times smaller than those obtained in SQL. As for hierarchical queries, only when we increased the amount of manager relationships per employee, by a factor of ten, were we able to see that Neo4j performed up to 3.5 times faster than SQL. These results regard databases with a maximum of one million employees and it is safe to believe that the differences in performance would grow larger for even bigger and more connected databases. Furthermore, it was concluded that the database modeling in graphs is more intuitive and immediate, and as for queries, its formulation is faster, simpler and more readable in the case of Cypher language when in contrast to its writing in SQL. This is due to the fact that the analysed Cypher queries had half the lines of their SQL query equivalents while also making use of ASCII characters for node and relationship representation when pattern matching. More concise queries result in a lower probability of errors occurring while also making it easier for new developers to catch up, understand and work with previously written queries.2021-10-112021-10-11T00:00:00Z2024-10-10T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/137426TID:202827780engRafael Araújo Mourainfo:eu-repo/semantics/embargoedAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T16:12:54Zoai:repositorio-aberto.up.pt:10216/137426Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:39:09.858654Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Graph databases for HR relationships
title	Graph databases for HR relationships
spellingShingle	Graph databases for HR relationships Rafael Araújo Moura Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
title_short	Graph databases for HR relationships
title_full	Graph databases for HR relationships
title_fullStr	Graph databases for HR relationships
title_full_unstemmed	Graph databases for HR relationships
title_sort	Graph databases for HR relationships
author	Rafael Araújo Moura
author_facet	Rafael Araújo Moura
author_role	author
dc.contributor.author.fl_str_mv	Rafael Araújo Moura
dc.subject.por.fl_str_mv	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
topic	Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering
description	Human Resources data modeling makes heavy use of graph like structure to hold objects (positions, contracts, jobs, cost centers, org units, etc.) and their relationships (an org unit reports to another org unit, a position belongs to an org unit and reports to another position, etc.). Traditionally, Human Resources' databases are relational, and querying graph data stored on a relational database is highly inefficient. The main objective of this dissertation is the comparison, in terms of performance and flexibility, between a relational and a graph database, when both have highly-connected data as it is the case with Human Resources' data. It was then required to model a database based on a real scenario of a company who deals these types of data on a daily basis. Queries were also formulated and they enabled the interaction with all stored entities and relationships between them while allowing the retrieval of performance results for both databases. Written queries were also compared, in terms of readability and expressability, with the objetive of determining which case is easier to understand what is being queried and how much simpler it is to formulate such query. When executing queries that traverse almost completely a data structure, the graph database performed much better than a relational database, even getting execution time values 200 times smaller than those obtained in SQL. As for hierarchical queries, only when we increased the amount of manager relationships per employee, by a factor of ten, were we able to see that Neo4j performed up to 3.5 times faster than SQL. These results regard databases with a maximum of one million employees and it is safe to believe that the differences in performance would grow larger for even bigger and more connected databases. Furthermore, it was concluded that the database modeling in graphs is more intuitive and immediate, and as for queries, its formulation is faster, simpler and more readable in the case of Cypher language when in contrast to its writing in SQL. This is due to the fact that the analysed Cypher queries had half the lines of their SQL query equivalents while also making use of ASCII characters for node and relationship representation when pattern matching. More concise queries result in a lower probability of errors occurring while also making it easier for new developers to catch up, understand and work with previously written queries.
publishDate	2021
dc.date.none.fl_str_mv	2021-10-11 2021-10-11T00:00:00Z 2024-10-10T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/10216/137426 TID:202827780
url	https://hdl.handle.net/10216/137426
identifier_str_mv	TID:202827780
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/embargoedAccess
eu_rights_str_mv	embargoedAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799136298400743424

Graph databases for HR relationships

Registros relacionados