Geminivirus data warehouse: a database enriched with machine learning approaches
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | , , , , , , , , , , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | LOCUS Repositório Institucional da UFV |
Texto Completo: | https://doi.org/10.1186/s12859-017-1646-4 http://www.locus.ufv.br/handle/123456789/12748 |
Resumo: | The Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. The use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses. |
id |
UFV_0bbd679d977cd974e830475792b4e5dd |
---|---|
oai_identifier_str |
oai:locus.ufv.br:123456789/12748 |
network_acronym_str |
UFV |
network_name_str |
LOCUS Repositório Institucional da UFV |
repository_id_str |
2145 |
spelling |
Silva, Jose Cleydson F.Carvalho, Thales F. M.Basso, Marcos F.Deguchi, MichihitoPereira, Welison A.Vidigal, Pedro M. P.Brustolini, Otávio J. B.Silva, Fabyano F.Dal-Bianco, MaximillerFontes, Renildes L. F.Santos, Anésia A.Zerbini, Francisco MuriloCerqueira, Fabio R.Fontes, Elizabeth P. B.R. Sobrinho, Roberto2017-11-06T09:22:19Z2017-11-06T09:22:19Z2017-05-051471-2105https://doi.org/10.1186/s12859-017-1646-4http://www.locus.ufv.br/handle/123456789/12748The Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. The use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.engBioMed Central Bioinformaticsv. 18, n. 240, May. 2017Machine learningKnowledge discoveryData miningGeminivirusData warehouseRandom forestGeminivirus data warehouse: a database enriched with machine learning approachesinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfinfo:eu-repo/semantics/openAccessreponame:LOCUS Repositório Institucional da UFVinstname:Universidade Federal de Viçosa (UFV)instacron:UFVORIGINALdocument.pdfdocument.pdftexto completoapplication/pdf1322530https://locus.ufv.br//bitstream/123456789/12748/1/document.pdf63e53fc4113f92e165abb0ac2941d82fMD51LICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://locus.ufv.br//bitstream/123456789/12748/2/license.txt8a4605be74aa9ea9d79846c1fba20a33MD52THUMBNAILdocument.pdf.jpgdocument.pdf.jpgIM Thumbnailimage/jpeg5199https://locus.ufv.br//bitstream/123456789/12748/3/document.pdf.jpg08e496d98aefd4a1d7dfd153e7f2ad82MD53123456789/127482017-11-06 22:00:28.69oai:locus.ufv.br:123456789/12748Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttps://www.locus.ufv.br/oai/requestfabiojreis@ufv.bropendoar:21452017-11-07T01:00:28LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV)false |
dc.title.en.fl_str_mv |
Geminivirus data warehouse: a database enriched with machine learning approaches |
title |
Geminivirus data warehouse: a database enriched with machine learning approaches |
spellingShingle |
Geminivirus data warehouse: a database enriched with machine learning approaches Silva, Jose Cleydson F. Machine learning Knowledge discovery Data mining Geminivirus Data warehouse Random forest |
title_short |
Geminivirus data warehouse: a database enriched with machine learning approaches |
title_full |
Geminivirus data warehouse: a database enriched with machine learning approaches |
title_fullStr |
Geminivirus data warehouse: a database enriched with machine learning approaches |
title_full_unstemmed |
Geminivirus data warehouse: a database enriched with machine learning approaches |
title_sort |
Geminivirus data warehouse: a database enriched with machine learning approaches |
author |
Silva, Jose Cleydson F. |
author_facet |
Silva, Jose Cleydson F. Carvalho, Thales F. M. Basso, Marcos F. Deguchi, Michihito Pereira, Welison A. Vidigal, Pedro M. P. Brustolini, Otávio J. B. Silva, Fabyano F. Dal-Bianco, Maximiller Fontes, Renildes L. F. Santos, Anésia A. Zerbini, Francisco Murilo Cerqueira, Fabio R. Fontes, Elizabeth P. B. R. Sobrinho, Roberto |
author_role |
author |
author2 |
Carvalho, Thales F. M. Basso, Marcos F. Deguchi, Michihito Pereira, Welison A. Vidigal, Pedro M. P. Brustolini, Otávio J. B. Silva, Fabyano F. Dal-Bianco, Maximiller Fontes, Renildes L. F. Santos, Anésia A. Zerbini, Francisco Murilo Cerqueira, Fabio R. Fontes, Elizabeth P. B. R. Sobrinho, Roberto |
author2_role |
author author author author author author author author author author author author author author |
dc.contributor.author.fl_str_mv |
Silva, Jose Cleydson F. Carvalho, Thales F. M. Basso, Marcos F. Deguchi, Michihito Pereira, Welison A. Vidigal, Pedro M. P. Brustolini, Otávio J. B. Silva, Fabyano F. Dal-Bianco, Maximiller Fontes, Renildes L. F. Santos, Anésia A. Zerbini, Francisco Murilo Cerqueira, Fabio R. Fontes, Elizabeth P. B. R. Sobrinho, Roberto |
dc.subject.pt-BR.fl_str_mv |
Machine learning Knowledge discovery Data mining Geminivirus Data warehouse Random forest |
topic |
Machine learning Knowledge discovery Data mining Geminivirus Data warehouse Random forest |
description |
The Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. The use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses. |
publishDate |
2017 |
dc.date.accessioned.fl_str_mv |
2017-11-06T09:22:19Z |
dc.date.available.fl_str_mv |
2017-11-06T09:22:19Z |
dc.date.issued.fl_str_mv |
2017-05-05 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.1186/s12859-017-1646-4 http://www.locus.ufv.br/handle/123456789/12748 |
dc.identifier.issn.none.fl_str_mv |
1471-2105 |
identifier_str_mv |
1471-2105 |
url |
https://doi.org/10.1186/s12859-017-1646-4 http://www.locus.ufv.br/handle/123456789/12748 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.ispartofseries.pt-BR.fl_str_mv |
v. 18, n. 240, May. 2017 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
BioMed Central Bioinformatics |
publisher.none.fl_str_mv |
BioMed Central Bioinformatics |
dc.source.none.fl_str_mv |
reponame:LOCUS Repositório Institucional da UFV instname:Universidade Federal de Viçosa (UFV) instacron:UFV |
instname_str |
Universidade Federal de Viçosa (UFV) |
instacron_str |
UFV |
institution |
UFV |
reponame_str |
LOCUS Repositório Institucional da UFV |
collection |
LOCUS Repositório Institucional da UFV |
bitstream.url.fl_str_mv |
https://locus.ufv.br//bitstream/123456789/12748/1/document.pdf https://locus.ufv.br//bitstream/123456789/12748/2/license.txt https://locus.ufv.br//bitstream/123456789/12748/3/document.pdf.jpg |
bitstream.checksum.fl_str_mv |
63e53fc4113f92e165abb0ac2941d82f 8a4605be74aa9ea9d79846c1fba20a33 08e496d98aefd4a1d7dfd153e7f2ad82 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
LOCUS Repositório Institucional da UFV - Universidade Federal de Viçosa (UFV) |
repository.mail.fl_str_mv |
fabiojreis@ufv.br |
_version_ |
1801213058027094016 |