A deep learning approach to identify not suitable for work images

Detalhes bibliográficos
Autor(a) principal: Bicho, Daniel
Data de Publicação: 2020
Outros Autores: J. Ferreira, Artur, Datia, Nuno
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10400.21/12354
Resumo: Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp.
id RCAP_1fb38df67481d6d87f7d2b6337b188d0
oai_identifier_str oai:repositorio.ipl.pt:10400.21/12354
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling A deep learning approach to identify not suitable for work imagesDeep learningDeep neural networksImage classificationNot suitable for work imagesResNetSqueezeNetWeb Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp.ISELRCIPLBicho, DanielJ. Ferreira, ArturDatia, Nuno2020-11-06T12:05:51Z20202020-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/12354engBICHO, Daniel; FERREIRA, Artur; DATIA, Nuno – A deep learning approach to identify not suitable for work images. i-ETC: ISEL Academic Journal of Electronics, Telecommunications and Computers. ISSN 2182-4010. Vol. 6, N.º 1 (2020) ID-3, pp. 1-112182-4010info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T10:05:05Zoai:repositorio.ipl.pt:10400.21/12354Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:20:27.661524Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv A deep learning approach to identify not suitable for work images
title A deep learning approach to identify not suitable for work images
spellingShingle A deep learning approach to identify not suitable for work images
Bicho, Daniel
Deep learning
Deep neural networks
Image classification
Not suitable for work images
ResNet
SqueezeNet
title_short A deep learning approach to identify not suitable for work images
title_full A deep learning approach to identify not suitable for work images
title_fullStr A deep learning approach to identify not suitable for work images
title_full_unstemmed A deep learning approach to identify not suitable for work images
title_sort A deep learning approach to identify not suitable for work images
author Bicho, Daniel
author_facet Bicho, Daniel
J. Ferreira, Artur
Datia, Nuno
author_role author
author2 J. Ferreira, Artur
Datia, Nuno
author2_role author
author
dc.contributor.none.fl_str_mv RCIPL
dc.contributor.author.fl_str_mv Bicho, Daniel
J. Ferreira, Artur
Datia, Nuno
dc.subject.por.fl_str_mv Deep learning
Deep neural networks
Image classification
Not suitable for work images
ResNet
SqueezeNet
topic Deep learning
Deep neural networks
Image classification
Not suitable for work images
ResNet
SqueezeNet
description Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp.
publishDate 2020
dc.date.none.fl_str_mv 2020-11-06T12:05:51Z
2020
2020-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.21/12354
url http://hdl.handle.net/10400.21/12354
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv BICHO, Daniel; FERREIRA, Artur; DATIA, Nuno – A deep learning approach to identify not suitable for work images. i-ETC: ISEL Academic Journal of Electronics, Telecommunications and Computers. ISSN 2182-4010. Vol. 6, N.º 1 (2020) ID-3, pp. 1-11
2182-4010
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv ISEL
publisher.none.fl_str_mv ISEL
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799133473467793408