A Deep Learning Approach to Identify Not Suitable for Work Images

Detalhes bibliográficos
Autor(a) principal: Bicho, Daniel
Data de Publicação: 2020
Outros Autores: Ferreira, Artur, Datia, Nuno
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://doi.org/10.34629/ipl.isel.i-ETC.80
Resumo: Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jsp
id RCAP_95f10f83d45607e216af7e96219ebb4b
oai_identifier_str oai:i-ETC.journals.isel.pt:article/80
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A Deep Learning Approach to Identify Not Suitable for Work ImagesComputers; Informatics; MultimediaDeep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message QueueWeb Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jspISEL - High Institute of Engineering of Lisbon2020-10-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.34629/ipl.isel.i-ETC.80oai:i-ETC.journals.isel.pt:article/80i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-32182-4010reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPenghttp://journals.isel.pt/index.php/i-ETC/article/view/80https://doi.org/10.34629/ipl.isel.i-ETC.80http://journals.isel.pt/index.php/i-ETC/article/view/80/67Copyright (c) 2020 Artur Ferreira, Daniel Bichohttp://creativecommons.org/licenses/by-nc/4.0info:eu-repo/semantics/openAccessBicho, DanielFerreira, ArturDatia, Nuno2022-09-20T15:26:06Zoai:i-ETC.journals.isel.pt:article/80Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T15:51:12.401781Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A Deep Learning Approach to Identify Not Suitable for Work Images
title A Deep Learning Approach to Identify Not Suitable for Work Images
spellingShingle A Deep Learning Approach to Identify Not Suitable for Work Images
Bicho, Daniel
Computers; Informatics; Multimedia
Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue
title_short A Deep Learning Approach to Identify Not Suitable for Work Images
title_full A Deep Learning Approach to Identify Not Suitable for Work Images
title_fullStr A Deep Learning Approach to Identify Not Suitable for Work Images
title_full_unstemmed A Deep Learning Approach to Identify Not Suitable for Work Images
title_sort A Deep Learning Approach to Identify Not Suitable for Work Images
author Bicho, Daniel
author_facet Bicho, Daniel
Ferreira, Artur
Datia, Nuno
author_role author
author2 Ferreira, Artur
Datia, Nuno
author2_role author
author
dc.contributor.author.fl_str_mv Bicho, Daniel
Ferreira, Artur
Datia, Nuno
dc.subject.por.fl_str_mv Computers; Informatics; Multimedia
Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue
topic Computers; Informatics; Multimedia
Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue
description Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jsp
publishDate 2020
dc.date.none.fl_str_mv 2020-10-16T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi.org/10.34629/ipl.isel.i-ETC.80
oai:i-ETC.journals.isel.pt:article/80
url https://doi.org/10.34629/ipl.isel.i-ETC.80
identifier_str_mv oai:i-ETC.journals.isel.pt:article/80
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv http://journals.isel.pt/index.php/i-ETC/article/view/80
https://doi.org/10.34629/ipl.isel.i-ETC.80
http://journals.isel.pt/index.php/i-ETC/article/view/80/67
dc.rights.driver.fl_str_mv Copyright (c) 2020 Artur Ferreira, Daniel Bicho
http://creativecommons.org/licenses/by-nc/4.0
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2020 Artur Ferreira, Daniel Bicho
http://creativecommons.org/licenses/by-nc/4.0
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv ISEL - High Institute of Engineering of Lisbon
publisher.none.fl_str_mv ISEL - High Institute of Engineering of Lisbon
dc.source.none.fl_str_mv i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3
i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3
2182-4010
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799130375506624512