Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network

Silva, Vinícius; Soares, Filomena; Leão, Celina Pinto; Esteves, João Sena; Vercelli, Gianni

Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network

Detalhes bibliográficos
Autor(a) principal:	Silva, Vinícius
Data de Publicação:	2021
Outros Autores:	Soares, Filomena, Leão, Celina Pinto, Esteves, João Sena, Vercelli, Gianni
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/1822/74307
Resumo:	Individuals with Autism Spectrum Disorder (ASD) typically present difficulties in engaging and interacting with their peers. Thus, researchers have been developing different technological solutions as support tools for children with ASD. Social robots, one example of these technological solutions, are often unaware of their game partners, preventing the automatic adaptation of their behavior to the user. Information that can be used to enrich this interaction and, consequently, adapt the system behavior is the recognition of different actions of the user by using RGB cameras or/and depth sensors. The present work proposes a method to automatically detect in real-time typical and stereotypical actions of children with ASD by using the Intel RealSense and the Nuitrack SDK to detect and extract the user joint coordinates. The pipeline starts by mapping the temporal and spatial joints dynamics onto a color image-based representation. Usually, the position of the joints in the final image is clustered into groups. In order to verify if the sequence of the joints in the final image representation can influence the model’s performance, two main experiments were conducted where in the first, the order of the grouped joints in the sequence was changed, and in the second, the joints were randomly ordered. In each experiment, statistical methods were used in the analysis. Based on the experiments conducted, it was found statistically significant differences concerning the joints sequence in the image, indicating that the order of the joints might impact the model’s performance. The final model, a Convolutional Neural Network (CNN), trained on the different actions (typical and stereotypical), was used to classify the different patterns of behavior, achieving a mean accuracy of 92.4% ± 0.0% on the test data. The entire pipeline ran on average at 31 FPS.

Metadados do item

id	RCAP_359ba23bae8dd9bf257ab4b48992fc84
oai_identifier_str	oai:repositorium.sdum.uminho.pt:1822/74307
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural networkHuman action recognitionHuman computer interactionAutism spectrum disorderConvolutional neural networkScience & TechnologyIndividuals with Autism Spectrum Disorder (ASD) typically present difficulties in engaging and interacting with their peers. Thus, researchers have been developing different technological solutions as support tools for children with ASD. Social robots, one example of these technological solutions, are often unaware of their game partners, preventing the automatic adaptation of their behavior to the user. Information that can be used to enrich this interaction and, consequently, adapt the system behavior is the recognition of different actions of the user by using RGB cameras or/and depth sensors. The present work proposes a method to automatically detect in real-time typical and stereotypical actions of children with ASD by using the Intel RealSense and the Nuitrack SDK to detect and extract the user joint coordinates. The pipeline starts by mapping the temporal and spatial joints dynamics onto a color image-based representation. Usually, the position of the joints in the final image is clustered into groups. In order to verify if the sequence of the joints in the final image representation can influence the model’s performance, two main experiments were conducted where in the first, the order of the grouped joints in the sequence was changed, and in the second, the joints were randomly ordered. In each experiment, statistical methods were used in the analysis. Based on the experiments conducted, it was found statistically significant differences concerning the joints sequence in the image, indicating that the order of the joints might impact the model’s performance. The final model, a Convolutional Neural Network (CNN), trained on the different actions (typical and stereotypical), was used to classify the different patterns of behavior, achieving a mean accuracy of 92.4% ± 0.0% on the test data. The entire pipeline ran on average at 31 FPS.This work has been supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020. Vinicius Silva thanks FCT for the PhD scholarship SFRH/BD/SFRH/BD/133314/2017.Multidisciplinary Digital Publishing Institute (MDPI)Universidade do MinhoSilva, ViníciusSoares, FilomenaLeão, Celina PintoEsteves, João SenaVercelli, Gianni2021-06-252021-06-25T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/1822/74307engSilva, V.; Soares, F.; Leão, C.P.; Esteves, J.S.; Vercelli, G. Skeleton Driven Action Recognition Using an Image-Based Spatial-Temporal Representation and Convolution Neural Network. Sensors 2021, 21, 4342. https://doi.org/10.3390/s211343421424-82201424-822010.3390/s2113434234201991https://www.mdpi.com/1424-8220/21/13/4342info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-07-21T12:02:08Zoai:repositorium.sdum.uminho.pt:1822/74307Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T18:52:05.845504Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network
title	Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network
spellingShingle	Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network Silva, Vinícius Human action recognition Human computer interaction Autism spectrum disorder Convolutional neural network Science & Technology
title_short	Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network
title_full	Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network
title_fullStr	Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network
title_full_unstemmed	Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network
title_sort	Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network
author	Silva, Vinícius
author_facet	Silva, Vinícius Soares, Filomena Leão, Celina Pinto Esteves, João Sena Vercelli, Gianni
author_role	author
author2	Soares, Filomena Leão, Celina Pinto Esteves, João Sena Vercelli, Gianni
author2_role	author author author author
dc.contributor.none.fl_str_mv	Universidade do Minho
dc.contributor.author.fl_str_mv	Silva, Vinícius Soares, Filomena Leão, Celina Pinto Esteves, João Sena Vercelli, Gianni
dc.subject.por.fl_str_mv	Human action recognition Human computer interaction Autism spectrum disorder Convolutional neural network Science & Technology
topic	Human action recognition Human computer interaction Autism spectrum disorder Convolutional neural network Science & Technology
description	Individuals with Autism Spectrum Disorder (ASD) typically present difficulties in engaging and interacting with their peers. Thus, researchers have been developing different technological solutions as support tools for children with ASD. Social robots, one example of these technological solutions, are often unaware of their game partners, preventing the automatic adaptation of their behavior to the user. Information that can be used to enrich this interaction and, consequently, adapt the system behavior is the recognition of different actions of the user by using RGB cameras or/and depth sensors. The present work proposes a method to automatically detect in real-time typical and stereotypical actions of children with ASD by using the Intel RealSense and the Nuitrack SDK to detect and extract the user joint coordinates. The pipeline starts by mapping the temporal and spatial joints dynamics onto a color image-based representation. Usually, the position of the joints in the final image is clustered into groups. In order to verify if the sequence of the joints in the final image representation can influence the model’s performance, two main experiments were conducted where in the first, the order of the grouped joints in the sequence was changed, and in the second, the joints were randomly ordered. In each experiment, statistical methods were used in the analysis. Based on the experiments conducted, it was found statistically significant differences concerning the joints sequence in the image, indicating that the order of the joints might impact the model’s performance. The final model, a Convolutional Neural Network (CNN), trained on the different actions (typical and stereotypical), was used to classify the different patterns of behavior, achieving a mean accuracy of 92.4% ± 0.0% on the test data. The entire pipeline ran on average at 31 FPS.
publishDate	2021
dc.date.none.fl_str_mv	2021-06-25 2021-06-25T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/1822/74307
url	http://hdl.handle.net/1822/74307
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Silva, V.; Soares, F.; Leão, C.P.; Esteves, J.S.; Vercelli, G. Skeleton Driven Action Recognition Using an Image-Based Spatial-Temporal Representation and Convolution Neural Network. Sensors 2021, 21, 4342. https://doi.org/10.3390/s21134342 1424-8220 1424-8220 10.3390/s21134342 34201991 https://www.mdpi.com/1424-8220/21/13/4342
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Multidisciplinary Digital Publishing Institute (MDPI)
publisher.none.fl_str_mv	Multidisciplinary Digital Publishing Institute (MDPI)
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799132296114077696

Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network

Registros relacionados