Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Tipo de documento: | Dissertação |
Idioma: | por |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/10216/106161 |
Resumo: | Recent advances in genome sequencing allow the study, at different contexts, of all identified human gene activities (⇡ 22.000 protein encoding genes). However, current knowledge on gene interactions lags behind, especially when one of the elements is a mitochondrial protein encoding gene (⇡ 1500). Mitochondrial proteins are encoded either by mitochondrial DNA (mtDNA; 13 proteins) or by nuclear DNA (nDNA; the remaining), which implies a coordinated communication between the two genomes. Since mitochondria coordinate several life-critical cellular activities, namely energy production and cell death, deregulation of this communication is implicated in many complex diseases such as neurodegenerative diseases, cancer and diabetes. Thus, this work aimed to identify high co-expression groups between mitochondrial genes- all genes, and associated protein networks in human tissues. Gene expression data for tissues were collected from the Genotype-Tissue Expression database (https://www.gtexportal.org/home/) counting 49 tissues (a total of 8527 samples, an average of 174 per tissue). The data was filtered to include only protein-encoding and physically non-overlapping genes (only one of the overlapping genes was maintained). Pearson's correlation values were calculated on all pairs of mitochondrial genes-all protein encoding genes, and outliers in the range [x 4SD, x+4SD] or [y 4SD, y+4SD] (SD stands for standard deviation) were excluded. Gene pairs with a correlation higher than 0.9 and 0.8, corresponding to big datasets, were represented in graph structures and analyzed by Data Mining clustering techniques in order to help extracting important information. Cytoscape soft- ware was used for graph analysis, allowing to evaluate complex network parameters and identify connection properties on the biological networks. The networks were enriched with functional data (pathways) from two different biological databases: Kyoto Encyclopedia of Genes and Ge- nomes (https://www.genome.kp/kegg) and Gene Ontology (http://www.geneontology.org). This network enrichment helped to infer biological functions of the correlated genes. Functional data comparison between tissues was conducted through hierarchical clustering techniques, by building binary matrices, similarity matrices using Jaccard index and applying agglomeration methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining). A web platform was built to interactively visualize and analyze the trees resulting from these methods. Biologically, we confirmed the existence of highly correlated pairs of mitochondrial-all pro- tein encoding genes, which are included in pathways of functional importance such as energy production and metabolite synthesis. Brain tissues have the largest and most dense networks, while kidney cortex, whole blood and fibroblasts had large but sparser networks. Generally, the strongest correlation between mitochondrial genes encoded by mtDNA belong to genes encoded by this genome, while mitochondrial genes encoded by nDNA are significantly correlated with other genes (mitochondrial or not) encoded by nDNA. This proves that correlation among genes encoded by the same genome is more efficient. The pipeline and the web tree viewer developed in this work will be available at GitHub under open source distribution along with installation documentation. This will make it possible to use and adapt the tools to the analyses of datasets being released to the public, in the context of diseases or other species. |
id |
RCAP_aace8a184be7253a0f65ca7dc09e8f93 |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/106161 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanosEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringRecent advances in genome sequencing allow the study, at different contexts, of all identified human gene activities (⇡ 22.000 protein encoding genes). However, current knowledge on gene interactions lags behind, especially when one of the elements is a mitochondrial protein encoding gene (⇡ 1500). Mitochondrial proteins are encoded either by mitochondrial DNA (mtDNA; 13 proteins) or by nuclear DNA (nDNA; the remaining), which implies a coordinated communication between the two genomes. Since mitochondria coordinate several life-critical cellular activities, namely energy production and cell death, deregulation of this communication is implicated in many complex diseases such as neurodegenerative diseases, cancer and diabetes. Thus, this work aimed to identify high co-expression groups between mitochondrial genes- all genes, and associated protein networks in human tissues. Gene expression data for tissues were collected from the Genotype-Tissue Expression database (https://www.gtexportal.org/home/) counting 49 tissues (a total of 8527 samples, an average of 174 per tissue). The data was filtered to include only protein-encoding and physically non-overlapping genes (only one of the overlapping genes was maintained). Pearson's correlation values were calculated on all pairs of mitochondrial genes-all protein encoding genes, and outliers in the range [x 4SD, x+4SD] or [y 4SD, y+4SD] (SD stands for standard deviation) were excluded. Gene pairs with a correlation higher than 0.9 and 0.8, corresponding to big datasets, were represented in graph structures and analyzed by Data Mining clustering techniques in order to help extracting important information. Cytoscape soft- ware was used for graph analysis, allowing to evaluate complex network parameters and identify connection properties on the biological networks. The networks were enriched with functional data (pathways) from two different biological databases: Kyoto Encyclopedia of Genes and Ge- nomes (https://www.genome.kp/kegg) and Gene Ontology (http://www.geneontology.org). This network enrichment helped to infer biological functions of the correlated genes. Functional data comparison between tissues was conducted through hierarchical clustering techniques, by building binary matrices, similarity matrices using Jaccard index and applying agglomeration methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining). A web platform was built to interactively visualize and analyze the trees resulting from these methods. Biologically, we confirmed the existence of highly correlated pairs of mitochondrial-all pro- tein encoding genes, which are included in pathways of functional importance such as energy production and metabolite synthesis. Brain tissues have the largest and most dense networks, while kidney cortex, whole blood and fibroblasts had large but sparser networks. Generally, the strongest correlation between mitochondrial genes encoded by mtDNA belong to genes encoded by this genome, while mitochondrial genes encoded by nDNA are significantly correlated with other genes (mitochondrial or not) encoded by nDNA. This proves that correlation among genes encoded by the same genome is more efficient. The pipeline and the web tree viewer developed in this work will be available at GitHub under open source distribution along with installation documentation. This will make it possible to use and adapt the tools to the analyses of datasets being released to the public, in the context of diseases or other species.2017-07-142017-07-14T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/106161TID:201804522porJoão Alexandre Ribeiro de Almeidainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T15:52:54Zoai:repositorio-aberto.up.pt:10216/106161Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:34:29.854721Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos |
title |
Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos |
spellingShingle |
Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos João Alexandre Ribeiro de Almeida Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos |
title_full |
Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos |
title_fullStr |
Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos |
title_full_unstemmed |
Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos |
title_sort |
Redes de co-expressão entre genes codificantes de proteínas mitocondriais e todos os restantes genes nos vários tecidos humanos |
author |
João Alexandre Ribeiro de Almeida |
author_facet |
João Alexandre Ribeiro de Almeida |
author_role |
author |
dc.contributor.author.fl_str_mv |
João Alexandre Ribeiro de Almeida |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
Recent advances in genome sequencing allow the study, at different contexts, of all identified human gene activities (⇡ 22.000 protein encoding genes). However, current knowledge on gene interactions lags behind, especially when one of the elements is a mitochondrial protein encoding gene (⇡ 1500). Mitochondrial proteins are encoded either by mitochondrial DNA (mtDNA; 13 proteins) or by nuclear DNA (nDNA; the remaining), which implies a coordinated communication between the two genomes. Since mitochondria coordinate several life-critical cellular activities, namely energy production and cell death, deregulation of this communication is implicated in many complex diseases such as neurodegenerative diseases, cancer and diabetes. Thus, this work aimed to identify high co-expression groups between mitochondrial genes- all genes, and associated protein networks in human tissues. Gene expression data for tissues were collected from the Genotype-Tissue Expression database (https://www.gtexportal.org/home/) counting 49 tissues (a total of 8527 samples, an average of 174 per tissue). The data was filtered to include only protein-encoding and physically non-overlapping genes (only one of the overlapping genes was maintained). Pearson's correlation values were calculated on all pairs of mitochondrial genes-all protein encoding genes, and outliers in the range [x 4SD, x+4SD] or [y 4SD, y+4SD] (SD stands for standard deviation) were excluded. Gene pairs with a correlation higher than 0.9 and 0.8, corresponding to big datasets, were represented in graph structures and analyzed by Data Mining clustering techniques in order to help extracting important information. Cytoscape soft- ware was used for graph analysis, allowing to evaluate complex network parameters and identify connection properties on the biological networks. The networks were enriched with functional data (pathways) from two different biological databases: Kyoto Encyclopedia of Genes and Ge- nomes (https://www.genome.kp/kegg) and Gene Ontology (http://www.geneontology.org). This network enrichment helped to infer biological functions of the correlated genes. Functional data comparison between tissues was conducted through hierarchical clustering techniques, by building binary matrices, similarity matrices using Jaccard index and applying agglomeration methods such as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining). A web platform was built to interactively visualize and analyze the trees resulting from these methods. Biologically, we confirmed the existence of highly correlated pairs of mitochondrial-all pro- tein encoding genes, which are included in pathways of functional importance such as energy production and metabolite synthesis. Brain tissues have the largest and most dense networks, while kidney cortex, whole blood and fibroblasts had large but sparser networks. Generally, the strongest correlation between mitochondrial genes encoded by mtDNA belong to genes encoded by this genome, while mitochondrial genes encoded by nDNA are significantly correlated with other genes (mitochondrial or not) encoded by nDNA. This proves that correlation among genes encoded by the same genome is more efficient. The pipeline and the web tree viewer developed in this work will be available at GitHub under open source distribution along with installation documentation. This will make it possible to use and adapt the tools to the analyses of datasets being released to the public, in the context of diseases or other species. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-07-14 2017-07-14T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10216/106161 TID:201804522 |
url |
https://hdl.handle.net/10216/106161 |
identifier_str_mv |
TID:201804522 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136253804806144 |