A Deep Learning Approach to Identify Not Suitable for Work Images
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://doi.org/10.34629/ipl.isel.i-ETC.80 |
Resumo: | Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jsp |
id |
RCAP_95f10f83d45607e216af7e96219ebb4b |
---|---|
oai_identifier_str |
oai:i-ETC.journals.isel.pt:article/80 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
A Deep Learning Approach to Identify Not Suitable for Work ImagesComputers; Informatics; MultimediaDeep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message QueueWeb Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jspISEL - High Institute of Engineering of Lisbon2020-10-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.34629/ipl.isel.i-ETC.80oai:i-ETC.journals.isel.pt:article/80i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-32182-4010reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAPenghttp://journals.isel.pt/index.php/i-ETC/article/view/80https://doi.org/10.34629/ipl.isel.i-ETC.80http://journals.isel.pt/index.php/i-ETC/article/view/80/67Copyright (c) 2020 Artur Ferreira, Daniel Bichohttp://creativecommons.org/licenses/by-nc/4.0info:eu-repo/semantics/openAccessBicho, DanielFerreira, ArturDatia, Nuno2022-09-20T15:26:06Zoai:i-ETC.journals.isel.pt:article/80Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T15:51:12.401781Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title |
A Deep Learning Approach to Identify Not Suitable for Work Images |
spellingShingle |
A Deep Learning Approach to Identify Not Suitable for Work Images Bicho, Daniel Computers; Informatics; Multimedia Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue |
title_short |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title_full |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title_fullStr |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title_full_unstemmed |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title_sort |
A Deep Learning Approach to Identify Not Suitable for Work Images |
author |
Bicho, Daniel |
author_facet |
Bicho, Daniel Ferreira, Artur Datia, Nuno |
author_role |
author |
author2 |
Ferreira, Artur Datia, Nuno |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Bicho, Daniel Ferreira, Artur Datia, Nuno |
dc.subject.por.fl_str_mv |
Computers; Informatics; Multimedia Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue |
topic |
Computers; Informatics; Multimedia Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue |
description |
Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jsp |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-10-16T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.34629/ipl.isel.i-ETC.80 oai:i-ETC.journals.isel.pt:article/80 |
url |
https://doi.org/10.34629/ipl.isel.i-ETC.80 |
identifier_str_mv |
oai:i-ETC.journals.isel.pt:article/80 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
http://journals.isel.pt/index.php/i-ETC/article/view/80 https://doi.org/10.34629/ipl.isel.i-ETC.80 http://journals.isel.pt/index.php/i-ETC/article/view/80/67 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2020 Artur Ferreira, Daniel Bicho http://creativecommons.org/licenses/by-nc/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2020 Artur Ferreira, Daniel Bicho http://creativecommons.org/licenses/by-nc/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
ISEL - High Institute of Engineering of Lisbon |
publisher.none.fl_str_mv |
ISEL - High Institute of Engineering of Lisbon |
dc.source.none.fl_str_mv |
i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3 2182-4010 reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1817550457284853760 |