Ancestralidade e co-regulação de genes codificadores de proteínas humanas

Kátia de Paiva Lopes

Ancestralidade e co-regulação de genes codificadores de proteínas humanas

Detalhes bibliográficos
Autor(a) principal:	Kátia de Paiva Lopes
Data de Publicação:	2017
Tipo de documento:	Tese
Idioma:	por
Título da fonte:	Repositório Institucional da UFMG
Texto Completo:	http://hdl.handle.net/1843/BUOS-APTQ3E
Resumo:	For deduction and quantification of transcriptomic analyses, some technologies were created, among them, there are those based on clone sequencing analysis, like EST (Expressed Sequence Tags), hybridization, such as microarrays and, NGS deep sequencing, asRNAseq. To study the origin of genes expressed in different tissues and organs, we analyzed data obtained from these three approaches. Data from Unigene, Gene Expression Omnibus (GEO) and Human Protein Atlas (HPA) were comprised into eight local databases. Next, accessing the orthologous groups of human genes, given by the UniRef Enriched Kegg Orthology (UEKO) and Orthologous Matrix (OMA) databases, we estimated the gene ages using the Lowest Common Ancestor (LCA) algorithm. Thus, we were able to determine the time of appearance of tissue expressed genes aiming to depict the human organs evolution.The global analysis of the organism, revealed eight distinct hallmarks along the timescale (i.e. eight major steps), showing that the housekeeping (HK) genes are more ancient than the tissue-enriched (TE) genes. The functional enrichment analysis found coherent groups ofterms and annotations assigned to the genes placed at each evolutionary stage. Next, a coexpression analysis was performed calculating the pair-wise Spearman correlation of all genes along 116 samples from HPA, and only selecting as positive gene-pairs, the ones that had acorrelation coefficient 0.85. As result, we ended with a robust network that includes 2,298 proteins and 20,005 interactions. In this network, the algorithm MCODE from Cytoscape revealed the existence of 11 major subnetworks that had a clear enrichment in certain groups or modules of highly coexpressed proteins, showing a tendency to include proteins of the same evolutionary age. Finally, for analysis of tissue-specific (TS) genes, we used thee different strategies: (1) by tissue clusterization; (2) by tissue classification according to phenotypic categories and; (3) using eight common tissues from the four databases used in this step: HPA (32 tissues), IBM (16), Fantom (56) and Gtex (53). Or results showed that,when all expressed genes are used, the analysis lack the tissue specific signature, approaching the distribution appearance of the entire repertoire of genes. Thus, to distinguish the organs origins, we examined the time of appearance of only tissue specific genes or genes withindistinct groups, such as elevated genes. Therefore, the approach that obtained the highest concordance of results, presented the tissues ordered by their gene ages in the following order: first brain, then heart, kidney, colon, ovary, prostate, lung and testis.

Metadados do item

id	UFMG_3854bba0d37712e0c1a1f96cbb8bae9e
oai_identifier_str	oai:repositorio.ufmg.br:1843/BUOS-APTQ3E
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	Ancestralidade e co-regulação de genes codificadores de proteínas humanasBioinformáticaBioinformáticaFor deduction and quantification of transcriptomic analyses, some technologies were created, among them, there are those based on clone sequencing analysis, like EST (Expressed Sequence Tags), hybridization, such as microarrays and, NGS deep sequencing, asRNAseq. To study the origin of genes expressed in different tissues and organs, we analyzed data obtained from these three approaches. Data from Unigene, Gene Expression Omnibus (GEO) and Human Protein Atlas (HPA) were comprised into eight local databases. Next, accessing the orthologous groups of human genes, given by the UniRef Enriched Kegg Orthology (UEKO) and Orthologous Matrix (OMA) databases, we estimated the gene ages using the Lowest Common Ancestor (LCA) algorithm. Thus, we were able to determine the time of appearance of tissue expressed genes aiming to depict the human organs evolution.The global analysis of the organism, revealed eight distinct hallmarks along the timescale (i.e. eight major steps), showing that the housekeeping (HK) genes are more ancient than the tissue-enriched (TE) genes. The functional enrichment analysis found coherent groups ofterms and annotations assigned to the genes placed at each evolutionary stage. Next, a coexpression analysis was performed calculating the pair-wise Spearman correlation of all genes along 116 samples from HPA, and only selecting as positive gene-pairs, the ones that had acorrelation coefficient 0.85. As result, we ended with a robust network that includes 2,298 proteins and 20,005 interactions. In this network, the algorithm MCODE from Cytoscape revealed the existence of 11 major subnetworks that had a clear enrichment in certain groups or modules of highly coexpressed proteins, showing a tendency to include proteins of the same evolutionary age. Finally, for analysis of tissue-specific (TS) genes, we used thee different strategies: (1) by tissue clusterization; (2) by tissue classification according to phenotypic categories and; (3) using eight common tissues from the four databases used in this step: HPA (32 tissues), IBM (16), Fantom (56) and Gtex (53). Or results showed that,when all expressed genes are used, the analysis lack the tissue specific signature, approaching the distribution appearance of the entire repertoire of genes. Thus, to distinguish the organs origins, we examined the time of appearance of only tissue specific genes or genes withindistinct groups, such as elevated genes. Therefore, the approach that obtained the highest concordance of results, presented the tissues ordered by their gene ages in the following order: first brain, then heart, kidney, colon, ovary, prostate, lung and testis.Para caracterização e quantificação do transcriptoma, várias tecnologias foram desenvolvidas, dentre elas as baseadas em clonagem e sequenciamento, como EST (Expressed Sequence Tags), hibridação em larga escala como em microarranjo e as provenientes desequenciamento de nova geração (NGS) como em RNASeq. Portanto, para o estudo da origem dos genes transcritos em diferentes órgãos humanos, foram analisados dados provenientes dessas três abordagens a partir da criação de oito bases de dados locais: ESTs doUnigene, microarranjo da base Gene Expression Omnibus (GEO) e dados de RNASeq do Human Protein Atlas (HPA). Posteriormente, com uso das ferramentas UEKO, OMA e do programa que calcula o LCA (Lowest Common Ancestor) foi possível estimar o clado de origem de cada gene e verificar a dinâmica evolutiva tecido-específica, bem como para o organismo completo. A análise global do organismo com uso da metodologia de filoestratigrafia revelou alguns marcos evolutivos com maior surgimento de genes, os quais foram divididos por nós em oito estágios evolutivos. Essa análise revelou ainda que os geneshousekeeping são mais antigos que os genes tecido-enriquecidos e, os resultados da análise de enriquecimento funcional apresentaram termos e anotações coerentes para cada grupo de genes mapeados em seus estágios evolutivos. Em seguida, para análise de co-expressãogênica foi criada uma rede que inclui 2.298 proteínas e 20.005 interações, sendo que, foram utilizados apenas os pares de genes com correlação de Spearman >= 0.85. Nesta rede, o algoritmo MCODE do Cytoscape demarcou a presença de 11 sub-redes e evidenciou aexistência de ligações estreitas entre proteínas de mesmo estágio evolutivo. Por fim, para análise de genes tecido-específicos foram utilizadas três diferentes estratégias: (1) por agrupamento de tecidos; (2) por classificação em níveis de tecido de acordo com suas categorias fenotípicas e; (3) utilizando os oito tecidos em comum nas quatro bases de dados utilizadas para esta análise: HPA (32 tecidos), IBM (16), Fantom (56) e Gtex (53). Esta última análise demonstrou que é necessário a utilização de um subgrupo de genes expressos paradiferenciação da dinâmica evolutiva, porque quando são utilizados todos os genes expressos, mesmo separadamente por tecido, o resultado final é a dinâmica evolutiva do organismo. Assim, a abordagem que obteve maior concordância de resultados apresenta a seguinte ordemde surgimento dos genes que compõe seus respectivos tecidos em Homo sapiens: primeiro surgiram os genes específicos do cérebro, depois coração, rim, cólon, ovário, próstata, pulmão e testículo.Universidade Federal de Minas GeraisUFMGJose Miguel OrtegaJavier De Las RivasGloria Regina FrancoFrancisco Pereira LoboGabriel da Rocha FernandesSandro José de SouzaKátia de Paiva Lopes2019-08-09T20:45:31Z2019-08-09T20:45:31Z2017-02-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://hdl.handle.net/1843/BUOS-APTQ3Einfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2019-11-14T07:03:32Zoai:repositorio.ufmg.br:1843/BUOS-APTQ3ERepositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2019-11-14T07:03:32Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv	Ancestralidade e co-regulação de genes codificadores de proteínas humanas
title	Ancestralidade e co-regulação de genes codificadores de proteínas humanas
spellingShingle	Ancestralidade e co-regulação de genes codificadores de proteínas humanas Kátia de Paiva Lopes Bioinformática Bioinformática
title_short	Ancestralidade e co-regulação de genes codificadores de proteínas humanas
title_full	Ancestralidade e co-regulação de genes codificadores de proteínas humanas
title_fullStr	Ancestralidade e co-regulação de genes codificadores de proteínas humanas
title_full_unstemmed	Ancestralidade e co-regulação de genes codificadores de proteínas humanas
title_sort	Ancestralidade e co-regulação de genes codificadores de proteínas humanas
author	Kátia de Paiva Lopes
author_facet	Kátia de Paiva Lopes
author_role	author
dc.contributor.none.fl_str_mv	Jose Miguel Ortega Javier De Las Rivas Gloria Regina Franco Francisco Pereira Lobo Gabriel da Rocha Fernandes Sandro José de Souza
dc.contributor.author.fl_str_mv	Kátia de Paiva Lopes
dc.subject.por.fl_str_mv	Bioinformática Bioinformática
topic	Bioinformática Bioinformática
description	For deduction and quantification of transcriptomic analyses, some technologies were created, among them, there are those based on clone sequencing analysis, like EST (Expressed Sequence Tags), hybridization, such as microarrays and, NGS deep sequencing, asRNAseq. To study the origin of genes expressed in different tissues and organs, we analyzed data obtained from these three approaches. Data from Unigene, Gene Expression Omnibus (GEO) and Human Protein Atlas (HPA) were comprised into eight local databases. Next, accessing the orthologous groups of human genes, given by the UniRef Enriched Kegg Orthology (UEKO) and Orthologous Matrix (OMA) databases, we estimated the gene ages using the Lowest Common Ancestor (LCA) algorithm. Thus, we were able to determine the time of appearance of tissue expressed genes aiming to depict the human organs evolution.The global analysis of the organism, revealed eight distinct hallmarks along the timescale (i.e. eight major steps), showing that the housekeeping (HK) genes are more ancient than the tissue-enriched (TE) genes. The functional enrichment analysis found coherent groups ofterms and annotations assigned to the genes placed at each evolutionary stage. Next, a coexpression analysis was performed calculating the pair-wise Spearman correlation of all genes along 116 samples from HPA, and only selecting as positive gene-pairs, the ones that had acorrelation coefficient 0.85. As result, we ended with a robust network that includes 2,298 proteins and 20,005 interactions. In this network, the algorithm MCODE from Cytoscape revealed the existence of 11 major subnetworks that had a clear enrichment in certain groups or modules of highly coexpressed proteins, showing a tendency to include proteins of the same evolutionary age. Finally, for analysis of tissue-specific (TS) genes, we used thee different strategies: (1) by tissue clusterization; (2) by tissue classification according to phenotypic categories and; (3) using eight common tissues from the four databases used in this step: HPA (32 tissues), IBM (16), Fantom (56) and Gtex (53). Or results showed that,when all expressed genes are used, the analysis lack the tissue specific signature, approaching the distribution appearance of the entire repertoire of genes. Thus, to distinguish the organs origins, we examined the time of appearance of only tissue specific genes or genes withindistinct groups, such as elevated genes. Therefore, the approach that obtained the highest concordance of results, presented the tissues ordered by their gene ages in the following order: first brain, then heart, kidney, colon, ovary, prostate, lung and testis.
publishDate	2017
dc.date.none.fl_str_mv	2017-02-20 2019-08-09T20:45:31Z 2019-08-09T20:45:31Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/1843/BUOS-APTQ3E
url	http://hdl.handle.net/1843/BUOS-APTQ3E
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais UFMG
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais UFMG
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv	repositorio@ufmg.br
_version_	1816829751240687616

Ancestralidade e co-regulação de genes codificadores de proteínas humanas

Registros relacionados