Deep learning for activity recognition using audio and video
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Outros Autores: | , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://hdl.handle.net/1822/78007 |
Resumo: | Neural networks have established themselves as powerhouses in what concerns several types of detection, ranging from human activities to their emotions. Several types of analysis exist, and the most popular and successful is video. However, there are other kinds of analysis, which, despite not being used as often, are still promising. In this article, a comparison between audio and video analysis is drawn in an attempt to classify violence detection in real-time streams. This study, which followed the CRISP-DM methodology, made use of several models available through PyTorch in order to test a diverse set of models and achieve robust results. The results obtained proved why video analysis has such prevalence, with the video classification handily outperforming its audio classification counterpart. Whilst the audio models attained on average 76% accuracy, video models secured average scores of 89%, showing a significant difference in performance. This study concluded that the applied methods are quite promising in detecting violence, using both audio and video. |
id |
RCAP_6bcc966168adeffb5013d10b5b2ef278 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/78007 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Deep learning for activity recognition using audio and videoaction recognitionviolence detectionreal-time video streamneural networksaudio classifiersvideo classifiersScience & TechnologyNeural networks have established themselves as powerhouses in what concerns several types of detection, ranging from human activities to their emotions. Several types of analysis exist, and the most popular and successful is video. However, there are other kinds of analysis, which, despite not being used as often, are still promising. In this article, a comparison between audio and video analysis is drawn in an attempt to classify violence detection in real-time streams. This study, which followed the CRISP-DM methodology, made use of several models available through PyTorch in order to test a diverse set of models and achieve robust results. The results obtained proved why video analysis has such prevalence, with the video classification handily outperforming its audio classification counterpart. Whilst the audio models attained on average 76% accuracy, video models secured average scores of 89%, showing a significant difference in performance. This study concluded that the applied methods are quite promising in detecting violence, using both audio and video.This work has been supported by FCT-Fundacao para a Ciencia e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020 and the project "Integrated and Innovative Solutions for the well-being of people in complex urban centers" within the Project Scope NORTE-01-0145-FEDER000086. C.N. thank the FCT-Fundacao para a Ciencia e Tecnologia for the grant 2021.06507.BD.MDPIUniversidade do MinhoReinolds, FranciscoNeto, CristianaMachado, José Manuel2022-032022-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/78007engReinolds, F.; Neto, C.; Machado, J. Deep Learning for Activity Recognition Using Audio and Video. Electronics 2022, 11, 782. https://doi.org/10.3390/electronics110507822079-929210.3390/electronics11050782https://www.mdpi.com/2079-9292/11/5/782info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:36:33Zoai:repositorium.sdum.uminho.pt:1822/78007Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T19:32:40.537609Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Deep learning for activity recognition using audio and video |
title |
Deep learning for activity recognition using audio and video |
spellingShingle |
Deep learning for activity recognition using audio and video Reinolds, Francisco action recognition violence detection real-time video stream neural networks audio classifiers video classifiers Science & Technology |
title_short |
Deep learning for activity recognition using audio and video |
title_full |
Deep learning for activity recognition using audio and video |
title_fullStr |
Deep learning for activity recognition using audio and video |
title_full_unstemmed |
Deep learning for activity recognition using audio and video |
title_sort |
Deep learning for activity recognition using audio and video |
author |
Reinolds, Francisco |
author_facet |
Reinolds, Francisco Neto, Cristiana Machado, José Manuel |
author_role |
author |
author2 |
Neto, Cristiana Machado, José Manuel |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Reinolds, Francisco Neto, Cristiana Machado, José Manuel |
dc.subject.por.fl_str_mv |
action recognition violence detection real-time video stream neural networks audio classifiers video classifiers Science & Technology |
topic |
action recognition violence detection real-time video stream neural networks audio classifiers video classifiers Science & Technology |
description |
Neural networks have established themselves as powerhouses in what concerns several types of detection, ranging from human activities to their emotions. Several types of analysis exist, and the most popular and successful is video. However, there are other kinds of analysis, which, despite not being used as often, are still promising. In this article, a comparison between audio and video analysis is drawn in an attempt to classify violence detection in real-time streams. This study, which followed the CRISP-DM methodology, made use of several models available through PyTorch in order to test a diverse set of models and achieve robust results. The results obtained proved why video analysis has such prevalence, with the video classification handily outperforming its audio classification counterpart. Whilst the audio models attained on average 76% accuracy, video models secured average scores of 89%, showing a significant difference in performance. This study concluded that the applied methods are quite promising in detecting violence, using both audio and video. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-03 2022-03-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1822/78007 |
url |
https://hdl.handle.net/1822/78007 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Reinolds, F.; Neto, C.; Machado, J. Deep Learning for Activity Recognition Using Audio and Video. Electronics 2022, 11, 782. https://doi.org/10.3390/electronics11050782 2079-9292 10.3390/electronics11050782 https://www.mdpi.com/2079-9292/11/5/782 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
MDPI |
publisher.none.fl_str_mv |
MDPI |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799132839604649984 |