Geminivirus data warehouse: a database enriched with machine learning approaches

Detalhes bibliográficos
Autor(a) principal: Silva, Jose Cleydson F.
Data de Publicação: 2017
Outros Autores: Carvalho, Thales F. M., Basso, Marcos F., Deguchi, Michihito, Pereira, Welison A., R. Sobrinho, Roberto, Vidigal, Pedro M. P., Brustolini, Otávio J. B., Silva, Fabyano F., Dal-Bianco, Maximiller, Fontes, Renildes L. F., Santos, Anésia A., Zerbini, Francisco Murilo, Cerqueira, Fabio R., Fontes, Elizabeth P. B.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Institucional da UFLA
Texto Completo: http://repositorio.ufla.br/jspui/handle/1/32708
Resumo: Background: the Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Results: here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. Conclusions: the use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.
id UFLA_9f2b26f58c3cf09db78efc5171b44757
oai_identifier_str oai:localhost:1/32708
network_acronym_str UFLA
network_name_str Repositório Institucional da UFLA
repository_id_str
spelling Geminivirus data warehouse: a database enriched with machine learning approachesMachine learningRandom forestKnowledge Discovery in Databases (KDD)Data miningData warehouseGeminivirusBackground: the Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Results: here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. Conclusions: the use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.Springer2019-02-01T19:59:20Z2019-02-01T19:59:20Z2017-05-05info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfSILVA, J. C. F. et al. Geminivirus data warehouse: a database enriched with machine learning approaches. BMC Bioinformatics, [S.l.], v. 18, p. 1-11, 2017.http://repositorio.ufla.br/jspui/handle/1/32708BMC Bioinformaticsreponame:Repositório Institucional da UFLAinstname:Universidade Federal de Lavras (UFLA)instacron:UFLAAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccessSilva, Jose Cleydson F.Carvalho, Thales F. M.Basso, Marcos F.Deguchi, MichihitoPereira, Welison A.R. Sobrinho, RobertoVidigal, Pedro M. P.Brustolini, Otávio J. B.Silva, Fabyano F.Dal-Bianco, MaximillerFontes, Renildes L. F.Santos, Anésia A.Zerbini, Francisco MuriloCerqueira, Fabio R.Fontes, Elizabeth P. B.eng2019-02-01T20:09:40Zoai:localhost:1/32708Repositório InstitucionalPUBhttp://repositorio.ufla.br/oai/requestnivaldo@ufla.br || repositorio.biblioteca@ufla.bropendoar:2019-02-01T20:09:40Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)false
dc.title.none.fl_str_mv Geminivirus data warehouse: a database enriched with machine learning approaches
title Geminivirus data warehouse: a database enriched with machine learning approaches
spellingShingle Geminivirus data warehouse: a database enriched with machine learning approaches
Silva, Jose Cleydson F.
Machine learning
Random forest
Knowledge Discovery in Databases (KDD)
Data mining
Data warehouse
Geminivirus
title_short Geminivirus data warehouse: a database enriched with machine learning approaches
title_full Geminivirus data warehouse: a database enriched with machine learning approaches
title_fullStr Geminivirus data warehouse: a database enriched with machine learning approaches
title_full_unstemmed Geminivirus data warehouse: a database enriched with machine learning approaches
title_sort Geminivirus data warehouse: a database enriched with machine learning approaches
author Silva, Jose Cleydson F.
author_facet Silva, Jose Cleydson F.
Carvalho, Thales F. M.
Basso, Marcos F.
Deguchi, Michihito
Pereira, Welison A.
R. Sobrinho, Roberto
Vidigal, Pedro M. P.
Brustolini, Otávio J. B.
Silva, Fabyano F.
Dal-Bianco, Maximiller
Fontes, Renildes L. F.
Santos, Anésia A.
Zerbini, Francisco Murilo
Cerqueira, Fabio R.
Fontes, Elizabeth P. B.
author_role author
author2 Carvalho, Thales F. M.
Basso, Marcos F.
Deguchi, Michihito
Pereira, Welison A.
R. Sobrinho, Roberto
Vidigal, Pedro M. P.
Brustolini, Otávio J. B.
Silva, Fabyano F.
Dal-Bianco, Maximiller
Fontes, Renildes L. F.
Santos, Anésia A.
Zerbini, Francisco Murilo
Cerqueira, Fabio R.
Fontes, Elizabeth P. B.
author2_role author
author
author
author
author
author
author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Silva, Jose Cleydson F.
Carvalho, Thales F. M.
Basso, Marcos F.
Deguchi, Michihito
Pereira, Welison A.
R. Sobrinho, Roberto
Vidigal, Pedro M. P.
Brustolini, Otávio J. B.
Silva, Fabyano F.
Dal-Bianco, Maximiller
Fontes, Renildes L. F.
Santos, Anésia A.
Zerbini, Francisco Murilo
Cerqueira, Fabio R.
Fontes, Elizabeth P. B.
dc.subject.por.fl_str_mv Machine learning
Random forest
Knowledge Discovery in Databases (KDD)
Data mining
Data warehouse
Geminivirus
topic Machine learning
Random forest
Knowledge Discovery in Databases (KDD)
Data mining
Data warehouse
Geminivirus
description Background: the Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Results: here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. Conclusions: the use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.
publishDate 2017
dc.date.none.fl_str_mv 2017-05-05
2019-02-01T19:59:20Z
2019-02-01T19:59:20Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv SILVA, J. C. F. et al. Geminivirus data warehouse: a database enriched with machine learning approaches. BMC Bioinformatics, [S.l.], v. 18, p. 1-11, 2017.
http://repositorio.ufla.br/jspui/handle/1/32708
identifier_str_mv SILVA, J. C. F. et al. Geminivirus data warehouse: a database enriched with machine learning approaches. BMC Bioinformatics, [S.l.], v. 18, p. 1-11, 2017.
url http://repositorio.ufla.br/jspui/handle/1/32708
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv BMC Bioinformatics
reponame:Repositório Institucional da UFLA
instname:Universidade Federal de Lavras (UFLA)
instacron:UFLA
instname_str Universidade Federal de Lavras (UFLA)
instacron_str UFLA
institution UFLA
reponame_str Repositório Institucional da UFLA
collection Repositório Institucional da UFLA
repository.name.fl_str_mv Repositório Institucional da UFLA - Universidade Federal de Lavras (UFLA)
repository.mail.fl_str_mv nivaldo@ufla.br || repositorio.biblioteca@ufla.br
_version_ 1815439042405203968