Seleção de SNPs utilizando random forests
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Trabalho de conclusão de curso |
Idioma: | por |
Título da fonte: | Repositório Institucional da UFSCAR |
Texto Completo: | https://repositorio.ufscar.br/handle/ufscar/15891 |
Resumo: | Single Nucleotide Polymorphisms (SNPs) are single-base variations in the nucleotide sequence of different individuals or between homologous sequences within a living being. A large part of genetic variations occur as SNPs. Many of these genetic variations occur in plants, influencing characteristics directly linked to crop productivity, such as rice. In addition to being the largest producer among Western countries, Brazil is also the largest per capita consumer of rice. Rice is one of the main foods for human nutrition, being the food base for more than half of the world population and mostly produced by Asian countries, but also widely produced in Brazil. Rice is part of the Genetic Improvement Program of the Brazilian Agricultural Research Corporation (Embrapa), which aims to improve rice crops with the goal of reaching the consumption preference pattern in Brazil. The Selection of SNPs that are strongly related to the amylose content of rice is one of the problems to be solved in Embrapa’s Genetic Improvement program. The Selection of SNPs can be modeled computationally using Machine Learning tools, a subarea of Artificial Intelligence, making analysis faster and less costly. Thus, the objective of this research is to develop a method capable of performing the SNP Selection task. That is, given a characteristic of an organism, the method must find the SNPs related to the given characteristic. As a test case, the method will be applied to the SNPs of the genomic content of different rice crops, in order to find out which SNPs had the greatest impact on their amylose content. The developed method proved to be efficient in solving the SNP Selection problem. The analysis of the method highlighted an SNP that was validated experimentally by Embrapa as important for the amylose content. |
id |
SCAR_be064bf814e1c1754c7e4937ce16a91a |
---|---|
oai_identifier_str |
oai:repositorio.ufscar.br:ufscar/15891 |
network_acronym_str |
SCAR |
network_name_str |
Repositório Institucional da UFSCAR |
repository_id_str |
4322 |
spelling |
Frajacomo, Henrique CordeiroCerri, Ricardohttp://lattes.cnpq.br/6266519868438512http://lattes.cnpq.br/6231011286979492fee69377-ef03-433c-aa5b-f9b688f89d762022-04-21T12:25:39Z2022-04-21T12:25:39Z2020-07-02FRAJACOMO, Henrique Cordeiro. Seleção de SNPs utilizando random forests. 2020. Trabalho de Conclusão de Curso (Graduação em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/ufscar/15891.https://repositorio.ufscar.br/handle/ufscar/15891Single Nucleotide Polymorphisms (SNPs) are single-base variations in the nucleotide sequence of different individuals or between homologous sequences within a living being. A large part of genetic variations occur as SNPs. Many of these genetic variations occur in plants, influencing characteristics directly linked to crop productivity, such as rice. In addition to being the largest producer among Western countries, Brazil is also the largest per capita consumer of rice. Rice is one of the main foods for human nutrition, being the food base for more than half of the world population and mostly produced by Asian countries, but also widely produced in Brazil. Rice is part of the Genetic Improvement Program of the Brazilian Agricultural Research Corporation (Embrapa), which aims to improve rice crops with the goal of reaching the consumption preference pattern in Brazil. The Selection of SNPs that are strongly related to the amylose content of rice is one of the problems to be solved in Embrapa’s Genetic Improvement program. The Selection of SNPs can be modeled computationally using Machine Learning tools, a subarea of Artificial Intelligence, making analysis faster and less costly. Thus, the objective of this research is to develop a method capable of performing the SNP Selection task. That is, given a characteristic of an organism, the method must find the SNPs related to the given characteristic. As a test case, the method will be applied to the SNPs of the genomic content of different rice crops, in order to find out which SNPs had the greatest impact on their amylose content. The developed method proved to be efficient in solving the SNP Selection problem. The analysis of the method highlighted an SNP that was validated experimentally by Embrapa as important for the amylose content.Os Polimorfismos de Nucleotídeo Único (SNPs) são variações de base única na sequência de nucleotídeos de indivíduos diferentes ou entre sequências homólogas dentro de um ser vivo. Uma grande parte de variações genéticas ocorrem como SNPs. Muitas destas variações genéticas ocorrem em plantas, influenciando características diretamente ligadas com a produtividade de culturas, como por exemplo o arroz. O Brasil, além de ser o maior produtor dentre os países ocidentais, é também o maior consumidor per capita de arroz. O arroz é um dos principais alimentos para a nutrição humana, sendo a base alimentar para mais da metade da população mundial e majoritariamente produzido por países asiáticos, mas também largamente produzido no Brasil. O arroz faz parte do Programa de Melhoramento Genético da Empresa Brasileira de Pesquisa Agropecuária (Embrapa), que tem como objetivo a melhoria das safras de arroz mirando atingir o padrão de preferência de consumo do Brasil. A Seleção de SNPs que estão fortemente relacionadas com o teor de amilose do arroz é um dos problemas a serem resolvidos no programa de Melhoramento Genético da Embrapa. A Seleção de SNPs pode ser modelada computacionalmente utilizando ferramentas de Aprendizado de Máquina, subárea da Inteligência Artificial, tornando a análise mais rápida e menos custosa. Assim, o objetivo desta pesquisa é desenvolver um método capaz de realizar a tarefa de Seleção de SNPs. Isto é, dado uma característica de um organismo, o método deve encontrar os SNPs relacionados com a dada característica. Como caso de teste, o método será aplicado nos SNPs do conteúdo genômico de diferentes safras de arroz, com o objetivo de encontrar quais SNPs tiveram maior impacto em seu teor de amilose. O método desenvolvido se mostrou eficiente em resolver o problema da Seleção de SNPs. As análises do método destacaram um SNP que foi validado experimentalmente pela Embrapa como importante para o teor de amilose.Não recebi financiamentoporUniversidade Federal de São CarlosCâmpus São CarlosCiência da Computação - CCUFSCarAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessSeleção de SNPsBioinformáticaAprendizado de máquinaSNP selectionBioinformaticsMachine learningRandom forestsCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAOSeleção de SNPs utilizando random forestsSNP Selection using random forestsinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bachelorThesis600600c997f5ee-db84-40ed-8971-521dd105f2d1reponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALTCC_Henrique_Final.pdfTCC_Henrique_Final.pdfTCC Finalapplication/pdf675825https://repositorio.ufscar.br/bitstream/ufscar/15891/1/TCC_Henrique_Final.pdf9d0ac88a1af9ffd8e212edc919d3b71bMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufscar.br/bitstream/ufscar/15891/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52TEXTTCC_Henrique_Final.pdf.txtTCC_Henrique_Final.pdf.txtExtracted texttext/plain51363https://repositorio.ufscar.br/bitstream/ufscar/15891/3/TCC_Henrique_Final.pdf.txtde3b1fe67b807625d37843ccbffc8322MD53THUMBNAILTCC_Henrique_Final.pdf.jpgTCC_Henrique_Final.pdf.jpgIM Thumbnailimage/jpeg6011https://repositorio.ufscar.br/bitstream/ufscar/15891/4/TCC_Henrique_Final.pdf.jpg1db5af2b7999a1d0eb1c399f263a86e3MD54ufscar/158912023-09-18 18:32:18.714oai:repositorio.ufscar.br:ufscar/15891Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:32:18Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false |
dc.title.por.fl_str_mv |
Seleção de SNPs utilizando random forests |
dc.title.alternative.eng.fl_str_mv |
SNP Selection using random forests |
title |
Seleção de SNPs utilizando random forests |
spellingShingle |
Seleção de SNPs utilizando random forests Frajacomo, Henrique Cordeiro Seleção de SNPs Bioinformática Aprendizado de máquina SNP selection Bioinformatics Machine learning Random forests CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO |
title_short |
Seleção de SNPs utilizando random forests |
title_full |
Seleção de SNPs utilizando random forests |
title_fullStr |
Seleção de SNPs utilizando random forests |
title_full_unstemmed |
Seleção de SNPs utilizando random forests |
title_sort |
Seleção de SNPs utilizando random forests |
author |
Frajacomo, Henrique Cordeiro |
author_facet |
Frajacomo, Henrique Cordeiro |
author_role |
author |
dc.contributor.authorlattes.por.fl_str_mv |
http://lattes.cnpq.br/6231011286979492 |
dc.contributor.author.fl_str_mv |
Frajacomo, Henrique Cordeiro |
dc.contributor.advisor1.fl_str_mv |
Cerri, Ricardo |
dc.contributor.advisor1Lattes.fl_str_mv |
http://lattes.cnpq.br/6266519868438512 |
dc.contributor.authorID.fl_str_mv |
fee69377-ef03-433c-aa5b-f9b688f89d76 |
contributor_str_mv |
Cerri, Ricardo |
dc.subject.por.fl_str_mv |
Seleção de SNPs Bioinformática Aprendizado de máquina |
topic |
Seleção de SNPs Bioinformática Aprendizado de máquina SNP selection Bioinformatics Machine learning Random forests CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO |
dc.subject.eng.fl_str_mv |
SNP selection Bioinformatics Machine learning Random forests |
dc.subject.cnpq.fl_str_mv |
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO |
description |
Single Nucleotide Polymorphisms (SNPs) are single-base variations in the nucleotide sequence of different individuals or between homologous sequences within a living being. A large part of genetic variations occur as SNPs. Many of these genetic variations occur in plants, influencing characteristics directly linked to crop productivity, such as rice. In addition to being the largest producer among Western countries, Brazil is also the largest per capita consumer of rice. Rice is one of the main foods for human nutrition, being the food base for more than half of the world population and mostly produced by Asian countries, but also widely produced in Brazil. Rice is part of the Genetic Improvement Program of the Brazilian Agricultural Research Corporation (Embrapa), which aims to improve rice crops with the goal of reaching the consumption preference pattern in Brazil. The Selection of SNPs that are strongly related to the amylose content of rice is one of the problems to be solved in Embrapa’s Genetic Improvement program. The Selection of SNPs can be modeled computationally using Machine Learning tools, a subarea of Artificial Intelligence, making analysis faster and less costly. Thus, the objective of this research is to develop a method capable of performing the SNP Selection task. That is, given a characteristic of an organism, the method must find the SNPs related to the given characteristic. As a test case, the method will be applied to the SNPs of the genomic content of different rice crops, in order to find out which SNPs had the greatest impact on their amylose content. The developed method proved to be efficient in solving the SNP Selection problem. The analysis of the method highlighted an SNP that was validated experimentally by Embrapa as important for the amylose content. |
publishDate |
2020 |
dc.date.issued.fl_str_mv |
2020-07-02 |
dc.date.accessioned.fl_str_mv |
2022-04-21T12:25:39Z |
dc.date.available.fl_str_mv |
2022-04-21T12:25:39Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
format |
bachelorThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
FRAJACOMO, Henrique Cordeiro. Seleção de SNPs utilizando random forests. 2020. Trabalho de Conclusão de Curso (Graduação em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/ufscar/15891. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufscar.br/handle/ufscar/15891 |
identifier_str_mv |
FRAJACOMO, Henrique Cordeiro. Seleção de SNPs utilizando random forests. 2020. Trabalho de Conclusão de Curso (Graduação em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2020. Disponível em: https://repositorio.ufscar.br/handle/ufscar/15891. |
url |
https://repositorio.ufscar.br/handle/ufscar/15891 |
dc.language.iso.fl_str_mv |
por |
language |
por |
dc.relation.confidence.fl_str_mv |
600 600 |
dc.relation.authority.fl_str_mv |
c997f5ee-db84-40ed-8971-521dd105f2d1 |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos Ciência da Computação - CC |
dc.publisher.initials.fl_str_mv |
UFSCar |
publisher.none.fl_str_mv |
Universidade Federal de São Carlos Câmpus São Carlos Ciência da Computação - CC |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR |
instname_str |
Universidade Federal de São Carlos (UFSCAR) |
instacron_str |
UFSCAR |
institution |
UFSCAR |
reponame_str |
Repositório Institucional da UFSCAR |
collection |
Repositório Institucional da UFSCAR |
bitstream.url.fl_str_mv |
https://repositorio.ufscar.br/bitstream/ufscar/15891/1/TCC_Henrique_Final.pdf https://repositorio.ufscar.br/bitstream/ufscar/15891/2/license_rdf https://repositorio.ufscar.br/bitstream/ufscar/15891/3/TCC_Henrique_Final.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/15891/4/TCC_Henrique_Final.pdf.jpg |
bitstream.checksum.fl_str_mv |
9d0ac88a1af9ffd8e212edc919d3b71b e39d27027a6cc9cb039ad269a5db8e34 de3b1fe67b807625d37843ccbffc8322 1db5af2b7999a1d0eb1c399f263a86e3 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR) |
repository.mail.fl_str_mv |
|
_version_ |
1813715645773119488 |