Disclosing ambiguous gene aliases by automatic literature profiling
Autor(a) principal: | |
---|---|
Data de Publicação: | 2010 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da FIOCRUZ (ARCA) |
Texto Completo: | https://www.arca.fiocruz.br/handle/icict/9378 |
Resumo: | Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brasil/Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Grupo de Genômica e Biologia Computacional. Belo Horizonte, MG, Brasil |
id |
CRUZ_a447377a0914a50730ae8d27deaf5f6a |
---|---|
oai_identifier_str |
oai:www.arca.fiocruz.br:icict/9378 |
network_acronym_str |
CRUZ |
network_name_str |
Repositório Institucional da FIOCRUZ (ARCA) |
repository_id_str |
2135 |
spelling |
Coimbra, Roney SantosVanderwall, Dana EOliveira, Guilherme Corrêa2015-01-14T11:01:59Z2015-01-14T11:01:59Z2010COIMBRA, Roney Santos; VANDERWALL, Dana E; OLIVEIRA, Guilherme Corrêa. Disclosing ambiguous gene aliases by automatic literature profiling. BMC Genomics, 11(suppl.5):s3, 2010.1471-2164https://www.arca.fiocruz.br/handle/icict/937810.1186/1471-2164-11-S5-S3engBiomed CentralDisclosing ambiguous gene aliases by automatic literature profilinginfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleFundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brasil/Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Grupo de Genômica e Biologia Computacional. Belo Horizonte, MG, BrasilGlaxoSmithKline Moore Dr. Molecular Discovery Research. Research Triangle Park, NC, USAFundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brasil/Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Grupo de Genômica e Biologia Computacional. Belo Horizonte, MG, BrasilBackground Retrieving pertinent information from biological scientific literature requires cutting-edge text mining methods which may be able to recognize the meaning of the very ambiguous names of biological entities. Aliases of a gene share a common vocabulary in their respective collections of PubMed abstracts. This may be true even when these aliases are not associated with the same subset of documents. This gene-specific vocabulary defines a unique fingerprint that can be used to disclose ambiguous aliases. The present work describes an original method for automatically assessing the ambiguity levels of gene aliases in large gene terminologies based exclusively in the content of their associated literature. The method can deal with the two major problems restricting the usage of current text mining tools: 1) different names associated with the same gene; and 2) one name associated with multiple genes, or even with non-gene entities. Important, this method does not require training examples. Results Aliases were considered “ambiguous” when their Jaccard distance to the respective official gene symbol was equal or greater than the smallest distance between the official gene symbol and one of the three internal controls (randomly picked unrelated official gene symbols). Otherwise, they were assigned the status of “synonyms”. We evaluated the coherence of the results by comparing the frequencies of the official gene symbols in the text corpora retrieved with their respective “synonyms” or “ambiguous” aliases. Official gene symbols were mentioned in the abstract collections of 42 % (70/165) of their respective synonyms. No official gene symbol occurred in the abstract collections of any of their respective ambiguous aliases. In overall, querying PubMed with official gene symbols and “synonym” aliases allowed a 3.6-fold increase in the number of unique documents retrieved. Conclusions These results confirm that this method is able to distinguish between synonyms and ambiguous gene aliases based exclusively on their vocabulary fingerprint. The approach we describe could be used to enhance the retrieval of relevant literature related to a genedata miningscientific literatureabstratsinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da FIOCRUZ (ARCA)instname:Fundação Oswaldo Cruz (FIOCRUZ)instacron:FIOCRUZLICENSElicense.txtlicense.txttext/plain; charset=utf-81914https://www.arca.fiocruz.br/bitstream/icict/9378/1/license.txt7d48279ffeed55da8dfe2f8e81f3b81fMD51ORIGINALDisclosing ambiguous gene aliases by automatic.pdfDisclosing ambiguous gene aliases by automatic.pdfapplication/pdf217573https://www.arca.fiocruz.br/bitstream/icict/9378/2/Disclosing%20ambiguous%20gene%20aliases%20by%20automatic.pdfce54aa2c4ea49eb989f9e7308d827ce6MD52TEXTDisclosing ambiguous gene aliases by automatic.pdf.txtDisclosing ambiguous gene aliases by automatic.pdf.txtExtracted texttext/plain39696https://www.arca.fiocruz.br/bitstream/icict/9378/3/Disclosing%20ambiguous%20gene%20aliases%20by%20automatic.pdf.txtcef612ba271ba78030b89f8f4e7237a5MD53icict/93782019-06-19 10:07:31.853oai:www.arca.fiocruz.br:icict/9378TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkFvIGNvbmNvcmRhciBlIGFjZWl0YXIgZXN0YSBsaWNlbsOnYSB2b2PDqiAoYXV0b3Igb3UgZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzKToKCmEpIERlY2xhcmEgcXVlIGNvbmhlY2UgYSBwb2zDrXRpY2EgZGUgY29weXJpZ2h0IGRhIGVkaXRvcmEgZG8gc2V1IGRvY3VtZW50by4KCmIpIERlY2xhcmEgcXVlIGNvbmhlY2UgZSBhY2VpdGEgYXMgRGlyZXRyaXplcyBwYXJhIG8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgRnVuZGHDp8OjbyBPc3dhbGRvIENydXogKEZJT0NSVVopLgoKYykgQ29uY2VkZSDDoCBGSU9DUlVaIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSBhcnF1aXZhciwgcmVwcm9kdXppciwgY29udmVydGVyIChjb21vIGRlZmluaWRvIGEgc2VndWlyKSwgY29tdW5pY2FyCiAKZS9vdSBkaXN0cmlidWlyIG5vIFJlcG9zaXTDs3JpbyBkYSBGSU9DUlVaLCBvIGRvY3VtZW50byBlbnRyZWd1ZSAoaW5jbHVpbmRvIG8gcmVzdW1vL2Fic3RyYWN0KSBlbSBmb3JtYXRvIGRpZ2l0YWwgb3UgCgpwb3IgcXVhbHF1ZXIgb3V0cm8gbWVpby4KCmQpIERlY2xhcmEgcXVlIGF1dG9yaXphIGEgRklPQ1JVWiBhIGFycXVpdmFyIG1haXMgZGUgdW1hIGPDs3BpYSBkZXN0ZSBkb2N1bWVudG8gZSBjb252ZXJ0w6otbG8sIHNlbSBhbHRlcmFyIG8gc2V1IGNvbnRlw7pkbywgCgpwYXJhIHF1YWxxdWVyIGZvcm1hdG8gZGUgYXJxdWl2bywgbWVpbyBvdSBzdXBvcnRlLCBwYXJhIGVmZWl0b3MgZGUgc2VndXJhbsOnYSwgcHJlc2VydmHDp8OjbyAoYmFja3VwKSBlIGFjZXNzby4KCmUpIERlY2xhcmEgcXVlIG8gZG9jdW1lbnRvIHN1Ym1ldGlkbyDDqSBvIHNldSB0cmFiYWxobyBvcmlnaW5hbCwgZSBxdWUgZGV0w6ltIG8gZGlyZWl0byBkZSBjb25jZWRlciBhIHRlcmNlaXJvcyBvcyBkaXJlaXRvcyAKCmNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBEZWNsYXJhIHRhbWLDqW0gcXVlIGEgZW50cmVnYSBkbyBkb2N1bWVudG8gbsOjbyBpbmZyaW5nZSBvcyBkaXJlaXRvcyBkZSBxdWFscXVlciBvdXRyYSBwZXNzb2Egb3UgZW50aWRhZGUuCgpmKSBEZWNsYXJhIHF1ZSwgbm8gY2FzbyBkbyBkb2N1bWVudG8gc3VibWV0aWRvIGNvbnRlciBtYXRlcmlhbCBkbyBxdWFsIG7Do28gZGV0w6ltIG9zIGRpcmVpdG9zIGRlIGF1dG9yLCBvYnRldmUgYSBhdXRvcml6YcOnw6NvIAoKaXJyZXN0cml0YSBkbyByZXNwZWN0aXZvIGRldGVudG9yIGRlc3NlcyBkaXJlaXRvcywgcGFyYSBjZWRlciBhIEZJT0NSVVogb3MgZGlyZWl0b3MgcmVxdWVyaWRvcyBwb3IgZXN0YSBMaWNlbsOnYSBlIGF1dG9yaXphciBhIAoKdXRpbGl6w6EtbG9zIGxlZ2FsbWVudGUuIERlY2xhcmEgdGFtYsOpbSBxdWUgZXNzZSBtYXRlcmlhbCBjdWpvcyBkaXJlaXRvcyBzw6NvIGRlIHRlcmNlaXJvcyBlc3TDoSBjbGFyYW1lbnRlIGlkZW50aWZpY2FkbyBlIHJlY29uaGVjaWRvIAoKbm8gdGV4dG8gb3UgY29udGXDumRvIGRvIGRvY3VtZW50byBlbnRyZWd1ZS4KCmcpIFNFIE8gRE9DVU1FTlRPIEVOVFJFR1VFIMOJIEJBU0VBRE8gRU0gVFJBQkFMSE8gRklOQU5DSUFETyBPVSBBUE9JQURPIFBPUiBPVVRSQSBJTlNUSVRVScOHw4NPIFFVRSBOw4NPIEEgRklPQ1JVWiwgREVDTEFSQSBRVUUgQ1VNUFJJVSAKClFVQUlTUVVFUiBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUEVMTyBSRVNQRUNUSVZPIENPTlRSQVRPIE9VIEFDT1JETy4gQSBGSU9DUlVaIGlkZW50aWZpY2Fyw6EgY2xhcmFtZW50ZSBvKHMpIG5vbWUocykgZG8ocykgYXV0b3IoZXMpIGRvcyAKCmRpcmVpdG9zIGRvIGRvY3VtZW50byBlbnRyZWd1ZSBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIHBhcmEgYWzDqW0gZG8gcHJldmlzdG8gbmEgYWzDrW5lYSBjKS4KRepositório InstitucionalPUBhttps://www.arca.fiocruz.br/oai/requestrepositorio.arca@fiocruz.bropendoar:21352019-06-19T13:07:31Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ)false |
dc.title.pt_BR.fl_str_mv |
Disclosing ambiguous gene aliases by automatic literature profiling |
title |
Disclosing ambiguous gene aliases by automatic literature profiling |
spellingShingle |
Disclosing ambiguous gene aliases by automatic literature profiling Coimbra, Roney Santos data mining scientific literature abstrats |
title_short |
Disclosing ambiguous gene aliases by automatic literature profiling |
title_full |
Disclosing ambiguous gene aliases by automatic literature profiling |
title_fullStr |
Disclosing ambiguous gene aliases by automatic literature profiling |
title_full_unstemmed |
Disclosing ambiguous gene aliases by automatic literature profiling |
title_sort |
Disclosing ambiguous gene aliases by automatic literature profiling |
author |
Coimbra, Roney Santos |
author_facet |
Coimbra, Roney Santos Vanderwall, Dana E Oliveira, Guilherme Corrêa |
author_role |
author |
author2 |
Vanderwall, Dana E Oliveira, Guilherme Corrêa |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Coimbra, Roney Santos Vanderwall, Dana E Oliveira, Guilherme Corrêa |
dc.subject.en.pt_BR.fl_str_mv |
data mining scientific literature abstrats |
topic |
data mining scientific literature abstrats |
description |
Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brasil/Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Grupo de Genômica e Biologia Computacional. Belo Horizonte, MG, Brasil |
publishDate |
2010 |
dc.date.issued.fl_str_mv |
2010 |
dc.date.accessioned.fl_str_mv |
2015-01-14T11:01:59Z |
dc.date.available.fl_str_mv |
2015-01-14T11:01:59Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
COIMBRA, Roney Santos; VANDERWALL, Dana E; OLIVEIRA, Guilherme Corrêa. Disclosing ambiguous gene aliases by automatic literature profiling. BMC Genomics, 11(suppl.5):s3, 2010. |
dc.identifier.uri.fl_str_mv |
https://www.arca.fiocruz.br/handle/icict/9378 |
dc.identifier.issn.none.fl_str_mv |
1471-2164 |
dc.identifier.doi.none.fl_str_mv |
10.1186/1471-2164-11-S5-S3 |
identifier_str_mv |
COIMBRA, Roney Santos; VANDERWALL, Dana E; OLIVEIRA, Guilherme Corrêa. Disclosing ambiguous gene aliases by automatic literature profiling. BMC Genomics, 11(suppl.5):s3, 2010. 1471-2164 10.1186/1471-2164-11-S5-S3 |
url |
https://www.arca.fiocruz.br/handle/icict/9378 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Biomed Central |
publisher.none.fl_str_mv |
Biomed Central |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da FIOCRUZ (ARCA) instname:Fundação Oswaldo Cruz (FIOCRUZ) instacron:FIOCRUZ |
instname_str |
Fundação Oswaldo Cruz (FIOCRUZ) |
instacron_str |
FIOCRUZ |
institution |
FIOCRUZ |
reponame_str |
Repositório Institucional da FIOCRUZ (ARCA) |
collection |
Repositório Institucional da FIOCRUZ (ARCA) |
bitstream.url.fl_str_mv |
https://www.arca.fiocruz.br/bitstream/icict/9378/1/license.txt https://www.arca.fiocruz.br/bitstream/icict/9378/2/Disclosing%20ambiguous%20gene%20aliases%20by%20automatic.pdf https://www.arca.fiocruz.br/bitstream/icict/9378/3/Disclosing%20ambiguous%20gene%20aliases%20by%20automatic.pdf.txt |
bitstream.checksum.fl_str_mv |
7d48279ffeed55da8dfe2f8e81f3b81f ce54aa2c4ea49eb989f9e7308d827ce6 cef612ba271ba78030b89f8f4e7237a5 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da FIOCRUZ (ARCA) - Fundação Oswaldo Cruz (FIOCRUZ) |
repository.mail.fl_str_mv |
repositorio.arca@fiocruz.br |
_version_ |
1813008942580629504 |