Information system for image classification based on frequency curve proximity

Detalhes bibliográficos
Autor(a) principal: Sánchez, L.
Data de Publicação: 2017
Outros Autores: Alfonso-Cendón, Javier, Oliveira, T., Ordieres-Meré, Joaquín B., Castejón Limas, Manuel, Novais, Paulo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/50528
Resumo: With the size digital collections are currently reaching, retrieving the best match of a document from large collections by comparing hundreds of tags is a task that involves considerable algorithm complexity, even more so if the number of tags in the collection is not fixed. For these cases, similarity search appears to be the best retrieval method, but there is a lack of techniques suited for these conditions. This work presents a combination of machine learning algorithms put together to find the most similar object of a given one in a set of pre-processed objects based only on their metadata tags. The algorithm represents objects as character frequency curves and is capable of finding relationships between objects without an apparent association. It can also be parallelized using MapReduce strategies to perform the search. This method can be applied to a wide variety of documents with metadata tags. The case-study used in this work to demonstrate the similarity search technique is that of a collection of image objects in JavaScript Object Notation (JSON) containing metadata tags.
id RCAP_08885c7fe0fd15960a0dad5d665a2b98
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/50528
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Information system for image classification based on frequency curve proximityInformation systemSimilarity searchFrequent itemset miningMetadataImage classificationEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaScience & TechnologyWith the size digital collections are currently reaching, retrieving the best match of a document from large collections by comparing hundreds of tags is a task that involves considerable algorithm complexity, even more so if the number of tags in the collection is not fixed. For these cases, similarity search appears to be the best retrieval method, but there is a lack of techniques suited for these conditions. This work presents a combination of machine learning algorithms put together to find the most similar object of a given one in a set of pre-processed objects based only on their metadata tags. The algorithm represents objects as character frequency curves and is capable of finding relationships between objects without an apparent association. It can also be parallelized using MapReduce strategies to perform the search. This method can be applied to a wide variety of documents with metadata tags. The case-study used in this work to demonstrate the similarity search technique is that of a collection of image objects in JavaScript Object Notation (JSON) containing metadata tags.This work has been done in the context of the project “ASASEC (Advisory System Against Sexual Exploitation of Children)” (HOME/2010/ISEC/AG/043) supported by the European Union with the program “Prevention and fight against crime”.info:eu-repo/semantics/publishedVersionElsevierUniversidade do MinhoSánchez, L.Alfonso-Cendón, JavierOliveira, T.Ordieres-Meré, Joaquín B.Castejón Limas, ManuelNovais, Paulo2017-032017-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/50528engSánchez L., Alfonso-Cendón J., Oliveira T., Ordieres-Meré J., Castejón Limas M., Novais P., Information system for image classification based on frequency curve proximity, Information Systems, Elsevier Science, ISSN: 03064379, Vol. 64, pp12–21, 2017. http://dx.doi.org/10.1016/j.is.2016.08.0010306-437910.1016/j.is.2016.08.001info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:01:27Zoai:repositorium.sdum.uminho.pt:1822/50528Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:51:22.391202Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Information system for image classification based on frequency curve proximity
title Information system for image classification based on frequency curve proximity
spellingShingle Information system for image classification based on frequency curve proximity
Sánchez, L.
Information system
Similarity search
Frequent itemset mining
Metadata
Image classification
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
title_short Information system for image classification based on frequency curve proximity
title_full Information system for image classification based on frequency curve proximity
title_fullStr Information system for image classification based on frequency curve proximity
title_full_unstemmed Information system for image classification based on frequency curve proximity
title_sort Information system for image classification based on frequency curve proximity
author Sánchez, L.
author_facet Sánchez, L.
Alfonso-Cendón, Javier
Oliveira, T.
Ordieres-Meré, Joaquín B.
Castejón Limas, Manuel
Novais, Paulo
author_role author
author2 Alfonso-Cendón, Javier
Oliveira, T.
Ordieres-Meré, Joaquín B.
Castejón Limas, Manuel
Novais, Paulo
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Sánchez, L.
Alfonso-Cendón, Javier
Oliveira, T.
Ordieres-Meré, Joaquín B.
Castejón Limas, Manuel
Novais, Paulo
dc.subject.por.fl_str_mv Information system
Similarity search
Frequent itemset mining
Metadata
Image classification
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
topic Information system
Similarity search
Frequent itemset mining
Metadata
Image classification
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
description With the size digital collections are currently reaching, retrieving the best match of a document from large collections by comparing hundreds of tags is a task that involves considerable algorithm complexity, even more so if the number of tags in the collection is not fixed. For these cases, similarity search appears to be the best retrieval method, but there is a lack of techniques suited for these conditions. This work presents a combination of machine learning algorithms put together to find the most similar object of a given one in a set of pre-processed objects based only on their metadata tags. The algorithm represents objects as character frequency curves and is capable of finding relationships between objects without an apparent association. It can also be parallelized using MapReduce strategies to perform the search. This method can be applied to a wide variety of documents with metadata tags. The case-study used in this work to demonstrate the similarity search technique is that of a collection of image objects in JavaScript Object Notation (JSON) containing metadata tags.
publishDate 2017
dc.date.none.fl_str_mv 2017-03
2017-03-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/50528
url http://hdl.handle.net/1822/50528
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Sánchez L., Alfonso-Cendón J., Oliveira T., Ordieres-Meré J., Castejón Limas M., Novais P., Information system for image classification based on frequency curve proximity, Information Systems, Elsevier Science, ISSN: 03064379, Vol. 64, pp12–21, 2017. http://dx.doi.org/10.1016/j.is.2016.08.001
0306-4379
10.1016/j.is.2016.08.001
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799132286304649216