Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos

Detalhes bibliográficos
Autor(a) principal: João Alexandre Ribeiro de Almeida
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/106161
Resumo: Recent advances in genome sequencing allow the study, at different contexts, of all identified human gene activities (⇡ 22.000 protein encoding genes). However, current knowledge on gene interactions lags behind, especially when one of the elements is a mitochondrial protein encoding gene (⇡ 1500). Mitochondrial proteins are encoded either by mitochondrial DNA (mtDNA; 13 proteins) or by nuclear DNA (nDNA; the remaining), which implies a coordinated communication between the two genomes. Since mitochondria coordinate several life-critical cellular activities, namely energy production and cell death, deregulation of this communication is implicated in many complex diseases such as neurodegenerative diseases, cancer and diabetes. Thus, this work aimed to identify high co-expression groups between mitochondrial genes- all genes, and associated protein networks in human tissues. Gene expression data for tissues were collected from the Genotype-Tissue Expression database (https://www.gtexportal.org/home/) counting 49 tissues (a total of 8527 samples, an average of 174 per tissue). The data was filtered to include only protein-encoding and physically non-overlapping genes (only one of the overlapping genes was maintained). Pearson's correlation values were calculated on all pairs of mitochondrial genes-all protein encoding genes, and outliers in the range [x 4SD, x+4SD] or [y 4SD, y+4SD] (SD stands for standard deviation) were excluded. Gene pairs with a correlation higher than 0.9 and 0.8, corresponding to big datasets, were represented in graph structures and analyzed by Data Mining clustering techniques in order to help extracting important information. Cytoscape soft- ware was used for graph analysis, allowing to evaluate complex network parameters and identify connection properties on the biological networks. The networks were enriched with functional data (pathways) from two different biological databases: Kyoto Encyclopedia of Genes and Ge- nomes (https://www.genome.kp/kegg) and Gene Ontology (http://www.geneontology.org). This network enrichment helped to infer biological functions of the correlated genes. Functional data comparison between tissues was conducted through hierarchical clustering techniques, by building binary matrices, similarity matrices using Jaccard index and applying agglomeration methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining). A web platform was built to interactively visualize and analyze the trees resulting from these methods. Biologically, we confirmed the existence of highly correlated pairs of mitochondrial-all pro- tein encoding genes, which are included in pathways of functional importance such as energy production and metabolite synthesis. Brain tissues have the largest and most dense networks, while kidney cortex, whole blood and fibroblasts had large but sparser networks. Generally, the strongest correlation between mitochondrial genes encoded by mtDNA belong to genes encoded by this genome, while mitochondrial genes encoded by nDNA are significantly correlated with other genes (mitochondrial or not) encoded by nDNA. This proves that correlation among genes encoded by the same genome is more efficient. The pipeline and the web tree viewer developed in this work will be available at GitHub under open source distribution along with installation documentation. This will make it possible to use and adapt the tools to the analyses of datasets being released to the public, in the context of diseases or other species.
id RCAP_aace8a184be7253a0f65ca7dc09e8f93
oai_identifier_str oai:repositorio-aberto.up.pt:10216/106161
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanosEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringRecent advances in genome sequencing allow the study, at different contexts, of all identified human gene activities (⇡ 22.000 protein encoding genes). However, current knowledge on gene interactions lags behind, especially when one of the elements is a mitochondrial protein encoding gene (⇡ 1500). Mitochondrial proteins are encoded either by mitochondrial DNA (mtDNA; 13 proteins) or by nuclear DNA (nDNA; the remaining), which implies a coordinated communication between the two genomes. Since mitochondria coordinate several life-critical cellular activities, namely energy production and cell death, deregulation of this communication is implicated in many complex diseases such as neurodegenerative diseases, cancer and diabetes. Thus, this work aimed to identify high co-expression groups between mitochondrial genes- all genes, and associated protein networks in human tissues. Gene expression data for tissues were collected from the Genotype-Tissue Expression database (https://www.gtexportal.org/home/) counting 49 tissues (a total of 8527 samples, an average of 174 per tissue). The data was filtered to include only protein-encoding and physically non-overlapping genes (only one of the overlapping genes was maintained). Pearson's correlation values were calculated on all pairs of mitochondrial genes-all protein encoding genes, and outliers in the range [x 4SD, x+4SD] or [y 4SD, y+4SD] (SD stands for standard deviation) were excluded. Gene pairs with a correlation higher than 0.9 and 0.8, corresponding to big datasets, were represented in graph structures and analyzed by Data Mining clustering techniques in order to help extracting important information. Cytoscape soft- ware was used for graph analysis, allowing to evaluate complex network parameters and identify connection properties on the biological networks. The networks were enriched with functional data (pathways) from two different biological databases: Kyoto Encyclopedia of Genes and Ge- nomes (https://www.genome.kp/kegg) and Gene Ontology (http://www.geneontology.org). This network enrichment helped to infer biological functions of the correlated genes. Functional data comparison between tissues was conducted through hierarchical clustering techniques, by building binary matrices, similarity matrices using Jaccard index and applying agglomeration methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining). A web platform was built to interactively visualize and analyze the trees resulting from these methods. Biologically, we confirmed the existence of highly correlated pairs of mitochondrial-all pro- tein encoding genes, which are included in pathways of functional importance such as energy production and metabolite synthesis. Brain tissues have the largest and most dense networks, while kidney cortex, whole blood and fibroblasts had large but sparser networks. Generally, the strongest correlation between mitochondrial genes encoded by mtDNA belong to genes encoded by this genome, while mitochondrial genes encoded by nDNA are significantly correlated with other genes (mitochondrial or not) encoded by nDNA. This proves that correlation among genes encoded by the same genome is more efficient. The pipeline and the web tree viewer developed in this work will be available at GitHub under open source distribution along with installation documentation. This will make it possible to use and adapt the tools to the analyses of datasets being released to the public, in the context of diseases or other species.2017-07-142017-07-14T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/106161TID:201804522porJoão Alexandre Ribeiro de Almeidainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T15:52:54Zoai:repositorio-aberto.up.pt:10216/106161Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:34:29.854721Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos
title Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos
spellingShingle Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos
João Alexandre Ribeiro de Almeida
Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
title_short Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos
title_full Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos
title_fullStr Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos
title_full_unstemmed Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos
title_sort Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos
author João Alexandre Ribeiro de Almeida
author_facet João Alexandre Ribeiro de Almeida
author_role author
dc.contributor.author.fl_str_mv João Alexandre Ribeiro de Almeida
dc.subject.por.fl_str_mv Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
topic Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
description Recent advances in genome sequencing allow the study, at different contexts, of all identified human gene activities (⇡ 22.000 protein encoding genes). However, current knowledge on gene interactions lags behind, especially when one of the elements is a mitochondrial protein encoding gene (⇡ 1500). Mitochondrial proteins are encoded either by mitochondrial DNA (mtDNA; 13 proteins) or by nuclear DNA (nDNA; the remaining), which implies a coordinated communication between the two genomes. Since mitochondria coordinate several life-critical cellular activities, namely energy production and cell death, deregulation of this communication is implicated in many complex diseases such as neurodegenerative diseases, cancer and diabetes. Thus, this work aimed to identify high co-expression groups between mitochondrial genes- all genes, and associated protein networks in human tissues. Gene expression data for tissues were collected from the Genotype-Tissue Expression database (https://www.gtexportal.org/home/) counting 49 tissues (a total of 8527 samples, an average of 174 per tissue). The data was filtered to include only protein-encoding and physically non-overlapping genes (only one of the overlapping genes was maintained). Pearson's correlation values were calculated on all pairs of mitochondrial genes-all protein encoding genes, and outliers in the range [x 4SD, x+4SD] or [y 4SD, y+4SD] (SD stands for standard deviation) were excluded. Gene pairs with a correlation higher than 0.9 and 0.8, corresponding to big datasets, were represented in graph structures and analyzed by Data Mining clustering techniques in order to help extracting important information. Cytoscape soft- ware was used for graph analysis, allowing to evaluate complex network parameters and identify connection properties on the biological networks. The networks were enriched with functional data (pathways) from two different biological databases: Kyoto Encyclopedia of Genes and Ge- nomes (https://www.genome.kp/kegg) and Gene Ontology (http://www.geneontology.org). This network enrichment helped to infer biological functions of the correlated genes. Functional data comparison between tissues was conducted through hierarchical clustering techniques, by building binary matrices, similarity matrices using Jaccard index and applying agglomeration methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining). A web platform was built to interactively visualize and analyze the trees resulting from these methods. Biologically, we confirmed the existence of highly correlated pairs of mitochondrial-all pro- tein encoding genes, which are included in pathways of functional importance such as energy production and metabolite synthesis. Brain tissues have the largest and most dense networks, while kidney cortex, whole blood and fibroblasts had large but sparser networks. Generally, the strongest correlation between mitochondrial genes encoded by mtDNA belong to genes encoded by this genome, while mitochondrial genes encoded by nDNA are significantly correlated with other genes (mitochondrial or not) encoded by nDNA. This proves that correlation among genes encoded by the same genome is more efficient. The pipeline and the web tree viewer developed in this work will be available at GitHub under open source distribution along with installation documentation. This will make it possible to use and adapt the tools to the analyses of datasets being released to the public, in the context of diseases or other species.
publishDate 2017
dc.date.none.fl_str_mv 2017-07-14
2017-07-14T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/106161
TID:201804522
url https://hdl.handle.net/10216/106161
identifier_str_mv TID:201804522
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799136253804806144