Query driven sequence pattern mining

Detalhes bibliográficos
Autor(a) principal: Azevedo, Paulo J.
Data de Publicação: 2006
Outros Autores: Ferreira, Pedro Gabriel
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/6588
Resumo: The discovery of frequent patterns present in biological sequences has a large number of applications, ranging from classification, clustering and understanding sequence structure and function. This paper presents an algorithm that discovers frequent sequence patterns (motifs) present in a query sequence in respect to a database of sequences. The query is used to guide the mining process and thus only the patterns present in the query are reported. Two main types of patterns can be identified: flexible and rigid gap patterns. The user can choose to report all or only maximal patterns. Constraints and Substitution Sets are pushed directly into the mining process. Experimental evaluation shows the efficiency of the algorithm, the usefulness and the relevance of the extracted patterns.
id RCAP_64930df986ab02b97dfc3eaf1ba981fd
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/6588
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Query driven sequence pattern miningBioinformaticsDatabasesThe discovery of frequent patterns present in biological sequences has a large number of applications, ranging from classification, clustering and understanding sequence structure and function. This paper presents an algorithm that discovers frequent sequence patterns (motifs) present in a query sequence in respect to a database of sequences. The query is used to guide the mining process and thus only the patterns present in the query are reported. Two main types of patterns can be identified: flexible and rigid gap patterns. The user can choose to report all or only maximal patterns. Constraints and Substitution Sets are pushed directly into the mining process. Experimental evaluation shows the efficiency of the algorithm, the usefulness and the relevance of the extracted patterns.Fundação para a Ciência e a Tecnologia (FCT)Universidade do MinhoAzevedo, Paulo J.Ferreira, Pedro Gabriel20062006-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/6588engSIMPÓSIO BRASILEIRO DE BANCO DE DADOS, 21, Florianópolis, 2006 – “Simpósio Brasileiro de Banco de Dados : Anais”. [S.l. : s.n., 2006].info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:29:22Zoai:repositorium.sdum.uminho.pt:1822/6588Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:24:20.912881Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Query driven sequence pattern mining
title Query driven sequence pattern mining
spellingShingle Query driven sequence pattern mining
Azevedo, Paulo J.
Bioinformatics
Databases
title_short Query driven sequence pattern mining
title_full Query driven sequence pattern mining
title_fullStr Query driven sequence pattern mining
title_full_unstemmed Query driven sequence pattern mining
title_sort Query driven sequence pattern mining
author Azevedo, Paulo J.
author_facet Azevedo, Paulo J.
Ferreira, Pedro Gabriel
author_role author
author2 Ferreira, Pedro Gabriel
author2_role author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Azevedo, Paulo J.
Ferreira, Pedro Gabriel
dc.subject.por.fl_str_mv Bioinformatics
Databases
topic Bioinformatics
Databases
description The discovery of frequent patterns present in biological sequences has a large number of applications, ranging from classification, clustering and understanding sequence structure and function. This paper presents an algorithm that discovers frequent sequence patterns (motifs) present in a query sequence in respect to a database of sequences. The query is used to guide the mining process and thus only the patterns present in the query are reported. Two main types of patterns can be identified: flexible and rigid gap patterns. The user can choose to report all or only maximal patterns. Constraints and Substitution Sets are pushed directly into the mining process. Experimental evaluation shows the efficiency of the algorithm, the usefulness and the relevance of the extracted patterns.
publishDate 2006
dc.date.none.fl_str_mv 2006
2006-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/6588
url http://hdl.handle.net/1822/6588
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv SIMPÓSIO BRASILEIRO DE BANCO DE DADOS, 21, Florianópolis, 2006 – “Simpósio Brasileiro de Banco de Dados : Anais”. [S.l. : s.n., 2006].
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132722578325504