Seleção de SNPs utilizando random forests

Detalhes bibliográficos
Autor(a) principal: Frajacomo, Henrique Cordeiro
Data de Publicação: 2020
Tipo de documento: Trabalho de conclusão de curso
Idioma: por
Título da fonte: Repositório Institucional da UFSCAR
Texto Completo: https://repositorio.ufscar.br/handle/ufscar/15891
Resumo: Single Nucleotide Polymorphisms (SNPs) are single-base variations in the nucleotide sequence of different individuals or between homologous sequences within a living being. A large part of genetic variations occur as SNPs. Many of these genetic variations occur in plants, influencing characteristics directly linked to crop productivity, such as rice. In addition to being the largest producer among Western countries, Brazil is also the largest per capita consumer of rice. Rice is one of the main foods for human nutrition, being the food base for more than half of the world population and mostly produced by Asian countries, but also widely produced in Brazil. Rice is part of the Genetic Improvement Program of the Brazilian Agricultural Research Corporation (Embrapa), which aims to improve rice crops with the goal of reaching the consumption preference pattern in Brazil. The Selection of SNPs that are strongly related to the amylose content of rice is one of the problems to be solved in Embrapa’s Genetic Improvement program. The Selection of SNPs can be modeled computationally using Machine Learning tools, a subarea of Artificial Intelligence, making analysis faster and less costly. Thus, the objective of this research is to develop a method capable of performing the SNP Selection task. That is, given a characteristic of an organism, the method must find the SNPs related to the given characteristic. As a test case, the method will be applied to the SNPs of the genomic content of different rice crops, in order to find out which SNPs had the greatest impact on their amylose content. The developed method proved to be efficient in solving the SNP Selection problem. The analysis of the method highlighted an SNP that was validated experimentally by Embrapa as important for the amylose content.
id SCAR_be064bf814e1c1754c7e4937ce16a91a
oai_identifier_str oai:repositorio.ufscar.br:ufscar/15891
network_acronym_str SCAR
network_name_str Repositório Institucional da UFSCAR
repository_id_str 4322
spelling Frajacomo, Henrique CordeiroCerri, Ricardohttp://lattes.cnpq.br/6266519868438512http://lattes.cnpq.br/6231011286979492fee69377-ef03-433c-aa5b-f9b688f89d762022-04-21T12:25:39Z2022-04-21T12:25:39Z2020-07-02FRAJACOMO, Henrique Cordeiro. Seleção de SNPs utilizando random forests. 2020. Trabalho de Conclusão de Curso (Graduação em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/ufscar/15891.https://repositorio.ufscar.br/handle/ufscar/15891Single Nucleotide Polymorphisms (SNPs) are single-base variations in the nucleotide sequence of different individuals or between homologous sequences within a living being. A large part of genetic variations occur as SNPs. Many of these genetic variations occur in plants, influencing characteristics directly linked to crop productivity, such as rice. In addition to being the largest producer among Western countries, Brazil is also the largest per capita consumer of rice. Rice is one of the main foods for human nutrition, being the food base for more than half of the world population and mostly produced by Asian countries, but also widely produced in Brazil. Rice is part of the Genetic Improvement Program of the Brazilian Agricultural Research Corporation (Embrapa), which aims to improve rice crops with the goal of reaching the consumption preference pattern in Brazil. The Selection of SNPs that are strongly related to the amylose content of rice is one of the problems to be solved in Embrapa’s Genetic Improvement program. The Selection of SNPs can be modeled computationally using Machine Learning tools, a subarea of Artificial Intelligence, making analysis faster and less costly. Thus, the objective of this research is to develop a method capable of performing the SNP Selection task. That is, given a characteristic of an organism, the method must find the SNPs related to the given characteristic. As a test case, the method will be applied to the SNPs of the genomic content of different rice crops, in order to find out which SNPs had the greatest impact on their amylose content. The developed method proved to be efficient in solving the SNP Selection problem. The analysis of the method highlighted an SNP that was validated experimentally by Embrapa as important for the amylose content.Os Polimorfismos de Nucleotídeo Único (SNPs) são variações de base única na sequência de nucleotídeos de indivíduos diferentes ou entre sequências homólogas dentro de um ser vivo. Uma grande parte de variações genéticas ocorrem como SNPs. Muitas destas variações genéticas ocorrem em plantas, influenciando características diretamente ligadas com a produtividade de culturas, como por exemplo o arroz. O Brasil, além de ser o maior produtor dentre os países ocidentais, é também o maior consumidor per capita de arroz. O arroz é um dos principais alimentos para a nutrição humana, sendo a base alimentar para mais da metade da população mundial e majoritariamente produzido por países asiáticos, mas também largamente produzido no Brasil. O arroz faz parte do Programa de Melhoramento Genético da Empresa Brasileira de Pesquisa Agropecuária (Embrapa), que tem como objetivo a melhoria das safras de arroz mirando atingir o padrão de preferência de consumo do Brasil. A Seleção de SNPs que estão fortemente relacionadas com o teor de amilose do arroz é um dos problemas a serem resolvidos no programa de Melhoramento Genético da Embrapa. A Seleção de SNPs pode ser modelada computacionalmente utilizando ferramentas de Aprendizado de Máquina, subárea da Inteligência Artificial, tornando a análise mais rápida e menos custosa. Assim, o objetivo desta pesquisa é desenvolver um método capaz de realizar a tarefa de Seleção de SNPs. Isto é, dado uma característica de um organismo, o método deve encontrar os SNPs relacionados com a dada característica. Como caso de teste, o método será aplicado nos SNPs do conteúdo genômico de diferentes safras de arroz, com o objetivo de encontrar quais SNPs tiveram maior impacto em seu teor de amilose. O método desenvolvido se mostrou eficiente em resolver o problema da Seleção de SNPs. As análises do método destacaram um SNP que foi validado experimentalmente pela Embrapa como importante para o teor de amilose.Não recebi financiamentoporUniversidade Federal de São CarlosCâmpus São CarlosCiência da Computação - CCUFSCarAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessSeleção de SNPsBioinformáticaAprendizado de máquinaSNP selectionBioinformaticsMachine learningRandom forestsCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAOSeleção de SNPs utilizando random forestsSNP Selection using random forestsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesis600600c997f5ee-db84-40ed-8971-521dd105f2d1reponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALTCC_Henrique_Final.pdfTCC_Henrique_Final.pdfTCC Finalapplication/pdf675825https://repositorio.ufscar.br/bitstream/ufscar/15891/1/TCC_Henrique_Final.pdf9d0ac88a1af9ffd8e212edc919d3b71bMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufscar.br/bitstream/ufscar/15891/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52TEXTTCC_Henrique_Final.pdf.txtTCC_Henrique_Final.pdf.txtExtracted texttext/plain51363https://repositorio.ufscar.br/bitstream/ufscar/15891/3/TCC_Henrique_Final.pdf.txtde3b1fe67b807625d37843ccbffc8322MD53THUMBNAILTCC_Henrique_Final.pdf.jpgTCC_Henrique_Final.pdf.jpgIM Thumbnailimage/jpeg6011https://repositorio.ufscar.br/bitstream/ufscar/15891/4/TCC_Henrique_Final.pdf.jpg1db5af2b7999a1d0eb1c399f263a86e3MD54ufscar/158912023-09-18 18:32:18.714oai:repositorio.ufscar.br:ufscar/15891Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:32:18Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv Seleção de SNPs utilizando random forests
dc.title.alternative.eng.fl_str_mv SNP Selection using random forests
title Seleção de SNPs utilizando random forests
spellingShingle Seleção de SNPs utilizando random forests
Frajacomo, Henrique Cordeiro
Seleção de SNPs
Bioinformática
Aprendizado de máquina
SNP selection
Bioinformatics
Machine learning
Random forests
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
title_short Seleção de SNPs utilizando random forests
title_full Seleção de SNPs utilizando random forests
title_fullStr Seleção de SNPs utilizando random forests
title_full_unstemmed Seleção de SNPs utilizando random forests
title_sort Seleção de SNPs utilizando random forests
author Frajacomo, Henrique Cordeiro
author_facet Frajacomo, Henrique Cordeiro
author_role author
dc.contributor.authorlattes.por.fl_str_mv http://lattes.cnpq.br/6231011286979492
dc.contributor.author.fl_str_mv Frajacomo, Henrique Cordeiro
dc.contributor.advisor1.fl_str_mv Cerri, Ricardo
dc.contributor.advisor1Lattes.fl_str_mv http://lattes.cnpq.br/6266519868438512
dc.contributor.authorID.fl_str_mv fee69377-ef03-433c-aa5b-f9b688f89d76
contributor_str_mv Cerri, Ricardo
dc.subject.por.fl_str_mv Seleção de SNPs
Bioinformática
Aprendizado de máquina
topic Seleção de SNPs
Bioinformática
Aprendizado de máquina
SNP selection
Bioinformatics
Machine learning
Random forests
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
dc.subject.eng.fl_str_mv SNP selection
Bioinformatics
Machine learning
Random forests
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
description Single Nucleotide Polymorphisms (SNPs) are single-base variations in the nucleotide sequence of different individuals or between homologous sequences within a living being. A large part of genetic variations occur as SNPs. Many of these genetic variations occur in plants, influencing characteristics directly linked to crop productivity, such as rice. In addition to being the largest producer among Western countries, Brazil is also the largest per capita consumer of rice. Rice is one of the main foods for human nutrition, being the food base for more than half of the world population and mostly produced by Asian countries, but also widely produced in Brazil. Rice is part of the Genetic Improvement Program of the Brazilian Agricultural Research Corporation (Embrapa), which aims to improve rice crops with the goal of reaching the consumption preference pattern in Brazil. The Selection of SNPs that are strongly related to the amylose content of rice is one of the problems to be solved in Embrapa’s Genetic Improvement program. The Selection of SNPs can be modeled computationally using Machine Learning tools, a subarea of Artificial Intelligence, making analysis faster and less costly. Thus, the objective of this research is to develop a method capable of performing the SNP Selection task. That is, given a characteristic of an organism, the method must find the SNPs related to the given characteristic. As a test case, the method will be applied to the SNPs of the genomic content of different rice crops, in order to find out which SNPs had the greatest impact on their amylose content. The developed method proved to be efficient in solving the SNP Selection problem. The analysis of the method highlighted an SNP that was validated experimentally by Embrapa as important for the amylose content.
publishDate 2020
dc.date.issued.fl_str_mv 2020-07-02
dc.date.accessioned.fl_str_mv 2022-04-21T12:25:39Z
dc.date.available.fl_str_mv 2022-04-21T12:25:39Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/bachelorThesis
format bachelorThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv FRAJACOMO, Henrique Cordeiro. Seleção de SNPs utilizando random forests. 2020. Trabalho de Conclusão de Curso (Graduação em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/ufscar/15891.
dc.identifier.uri.fl_str_mv https://repositorio.ufscar.br/handle/ufscar/15891
identifier_str_mv FRAJACOMO, Henrique Cordeiro. Seleção de SNPs utilizando random forests. 2020. Trabalho de Conclusão de Curso (Graduação em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/ufscar/15891.
url https://repositorio.ufscar.br/handle/ufscar/15891
dc.language.iso.fl_str_mv por
language por
dc.relation.confidence.fl_str_mv 600
600
dc.relation.authority.fl_str_mv c997f5ee-db84-40ed-8971-521dd105f2d1
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
Ciência da Computação - CC
dc.publisher.initials.fl_str_mv UFSCar
publisher.none.fl_str_mv Universidade Federal de São Carlos
Câmpus São Carlos
Ciência da Computação - CC
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFSCAR
instname:Universidade Federal de São Carlos (UFSCAR)
instacron:UFSCAR
instname_str Universidade Federal de São Carlos (UFSCAR)
instacron_str UFSCAR
institution UFSCAR
reponame_str Repositório Institucional da UFSCAR
collection Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv https://repositorio.ufscar.br/bitstream/ufscar/15891/1/TCC_Henrique_Final.pdf
https://repositorio.ufscar.br/bitstream/ufscar/15891/2/license_rdf
https://repositorio.ufscar.br/bitstream/ufscar/15891/3/TCC_Henrique_Final.pdf.txt
https://repositorio.ufscar.br/bitstream/ufscar/15891/4/TCC_Henrique_Final.pdf.jpg
bitstream.checksum.fl_str_mv 9d0ac88a1af9ffd8e212edc919d3b71b
e39d27027a6cc9cb039ad269a5db8e34
de3b1fe67b807625d37843ccbffc8322
1db5af2b7999a1d0eb1c399f263a86e3
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_ 1813715645773119488