A deep learning approach to identify not suitable for work images
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.21/12354 |
Resumo: | Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp. |
id |
RCAP_1fb38df67481d6d87f7d2b6337b188d0 |
---|---|
oai_identifier_str |
oai:repositorio.ipl.pt:10400.21/12354 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
A deep learning approach to identify not suitable for work imagesDeep learningDeep neural networksImage classificationNot suitable for work imagesResNetSqueezeNetWeb Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp.ISELRCIPLBicho, DanielJ. Ferreira, ArturDatia, Nuno2020-11-06T12:05:51Z20202020-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/12354engBICHO, Daniel; FERREIRA, Artur; DATIA, Nuno – A deep learning approach to identify not suitable for work images. i-ETC: ISEL Academic Journal of Electronics, Telecommunications and Computers. ISSN 2182-4010. Vol. 6, N.º 1 (2020) ID-3, pp. 1-112182-4010info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-08-03T10:05:05Zoai:repositorio.ipl.pt:10400.21/12354Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T20:20:27.661524Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
A deep learning approach to identify not suitable for work images |
title |
A deep learning approach to identify not suitable for work images |
spellingShingle |
A deep learning approach to identify not suitable for work images Bicho, Daniel Deep learning Deep neural networks Image classification Not suitable for work images ResNet SqueezeNet |
title_short |
A deep learning approach to identify not suitable for work images |
title_full |
A deep learning approach to identify not suitable for work images |
title_fullStr |
A deep learning approach to identify not suitable for work images |
title_full_unstemmed |
A deep learning approach to identify not suitable for work images |
title_sort |
A deep learning approach to identify not suitable for work images |
author |
Bicho, Daniel |
author_facet |
Bicho, Daniel J. Ferreira, Artur Datia, Nuno |
author_role |
author |
author2 |
J. Ferreira, Artur Datia, Nuno |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
RCIPL |
dc.contributor.author.fl_str_mv |
Bicho, Daniel J. Ferreira, Artur Datia, Nuno |
dc.subject.por.fl_str_mv |
Deep learning Deep neural networks Image classification Not suitable for work images ResNet SqueezeNet |
topic |
Deep learning Deep neural networks Image classification Not suitable for work images ResNet SqueezeNet |
description |
Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-11-06T12:05:51Z 2020 2020-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.21/12354 |
url |
http://hdl.handle.net/10400.21/12354 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
BICHO, Daniel; FERREIRA, Artur; DATIA, Nuno – A deep learning approach to identify not suitable for work images. i-ETC: ISEL Academic Journal of Electronics, Telecommunications and Computers. ISSN 2182-4010. Vol. 6, N.º 1 (2020) ID-3, pp. 1-11 2182-4010 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
ISEL |
publisher.none.fl_str_mv |
ISEL |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799133473467793408 |