Information system for image classification based on frequency curve proximity
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/1822/50528 |
Resumo: | With the size digital collections are currently reaching, retrieving the best match of a document from large collections by comparing hundreds of tags is a task that involves considerable algorithm complexity, even more so if the number of tags in the collection is not fixed. For these cases, similarity search appears to be the best retrieval method, but there is a lack of techniques suited for these conditions. This work presents a combination of machine learning algorithms put together to find the most similar object of a given one in a set of pre-processed objects based only on their metadata tags. The algorithm represents objects as character frequency curves and is capable of finding relationships between objects without an apparent association. It can also be parallelized using MapReduce strategies to perform the search. This method can be applied to a wide variety of documents with metadata tags. The case-study used in this work to demonstrate the similarity search technique is that of a collection of image objects in JavaScript Object Notation (JSON) containing metadata tags. |
id |
RCAP_08885c7fe0fd15960a0dad5d665a2b98 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/50528 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Information system for image classification based on frequency curve proximityInformation systemSimilarity searchFrequent itemset miningMetadataImage classificationEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaScience & TechnologyWith the size digital collections are currently reaching, retrieving the best match of a document from large collections by comparing hundreds of tags is a task that involves considerable algorithm complexity, even more so if the number of tags in the collection is not fixed. For these cases, similarity search appears to be the best retrieval method, but there is a lack of techniques suited for these conditions. This work presents a combination of machine learning algorithms put together to find the most similar object of a given one in a set of pre-processed objects based only on their metadata tags. The algorithm represents objects as character frequency curves and is capable of finding relationships between objects without an apparent association. It can also be parallelized using MapReduce strategies to perform the search. This method can be applied to a wide variety of documents with metadata tags. The case-study used in this work to demonstrate the similarity search technique is that of a collection of image objects in JavaScript Object Notation (JSON) containing metadata tags.This work has been done in the context of the project “ASASEC (Advisory System Against Sexual Exploitation of Children)” (HOME/2010/ISEC/AG/043) supported by the European Union with the program “Prevention and fight against crime”.info:eu-repo/semantics/publishedVersionElsevierUniversidade do MinhoSánchez, L.Alfonso-Cendón, JavierOliveira, T.Ordieres-Meré, Joaquín B.Castejón Limas, ManuelNovais, Paulo2017-032017-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/50528engSánchez L., Alfonso-Cendón J., Oliveira T., Ordieres-Meré J., Castejón Limas M., Novais P., Information system for image classification based on frequency curve proximity, Information Systems, Elsevier Science, ISSN: 03064379, Vol. 64, pp12–21, 2017. http://dx.doi.org/10.1016/j.is.2016.08.0010306-437910.1016/j.is.2016.08.001info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:01:27Zoai:repositorium.sdum.uminho.pt:1822/50528Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:51:22.391202Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Information system for image classification based on frequency curve proximity |
title |
Information system for image classification based on frequency curve proximity |
spellingShingle |
Information system for image classification based on frequency curve proximity Sánchez, L. Information system Similarity search Frequent itemset mining Metadata Image classification Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
title_short |
Information system for image classification based on frequency curve proximity |
title_full |
Information system for image classification based on frequency curve proximity |
title_fullStr |
Information system for image classification based on frequency curve proximity |
title_full_unstemmed |
Information system for image classification based on frequency curve proximity |
title_sort |
Information system for image classification based on frequency curve proximity |
author |
Sánchez, L. |
author_facet |
Sánchez, L. Alfonso-Cendón, Javier Oliveira, T. Ordieres-Meré, Joaquín B. Castejón Limas, Manuel Novais, Paulo |
author_role |
author |
author2 |
Alfonso-Cendón, Javier Oliveira, T. Ordieres-Meré, Joaquín B. Castejón Limas, Manuel Novais, Paulo |
author2_role |
author author author author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Sánchez, L. Alfonso-Cendón, Javier Oliveira, T. Ordieres-Meré, Joaquín B. Castejón Limas, Manuel Novais, Paulo |
dc.subject.por.fl_str_mv |
Information system Similarity search Frequent itemset mining Metadata Image classification Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
topic |
Information system Similarity search Frequent itemset mining Metadata Image classification Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
description |
With the size digital collections are currently reaching, retrieving the best match of a document from large collections by comparing hundreds of tags is a task that involves considerable algorithm complexity, even more so if the number of tags in the collection is not fixed. For these cases, similarity search appears to be the best retrieval method, but there is a lack of techniques suited for these conditions. This work presents a combination of machine learning algorithms put together to find the most similar object of a given one in a set of pre-processed objects based only on their metadata tags. The algorithm represents objects as character frequency curves and is capable of finding relationships between objects without an apparent association. It can also be parallelized using MapReduce strategies to perform the search. This method can be applied to a wide variety of documents with metadata tags. The case-study used in this work to demonstrate the similarity search technique is that of a collection of image objects in JavaScript Object Notation (JSON) containing metadata tags. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-03 2017-03-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1822/50528 |
url |
http://hdl.handle.net/1822/50528 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Sánchez L., Alfonso-Cendón J., Oliveira T., Ordieres-Meré J., Castejón Limas M., Novais P., Information system for image classification based on frequency curve proximity, Information Systems, Elsevier Science, ISSN: 03064379, Vol. 64, pp12–21, 2017. http://dx.doi.org/10.1016/j.is.2016.08.001 0306-4379 10.1016/j.is.2016.08.001 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799132286304649216 |