Bio-Inspired Modality Fusion for Active Speaker Detection

Detalhes bibliográficos
Autor(a) principal: Assunção, Gustavo
Data de Publicação: 2020
Outros Autores: Gonçalves, Nuno, Menezes, Paulo
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10316/103726
https://doi.org/10.3390/app11083397
Resumo: Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.
id RCAP_24d69ab926fcbe1c171e488e4c20a4ba
oai_identifier_str oai:estudogeral.uc.pt:10316/103726
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Bio-Inspired Modality Fusion for Active Speaker Detectionartificial neural networksmulti-modal perceptionhuman–robot interactionHuman beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.MDPI AG2020-02-28info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttp://hdl.handle.net/10316/103726http://hdl.handle.net/10316/103726https://doi.org/10.3390/app11083397eng2076-3417Assunção, GustavoGonçalves, NunoMenezes, Pauloinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2022-11-23T21:37:20Zoai:estudogeral.uc.pt:10316/103726Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T21:20:30.652990Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Bio-Inspired Modality Fusion for Active Speaker Detection
title Bio-Inspired Modality Fusion for Active Speaker Detection
spellingShingle Bio-Inspired Modality Fusion for Active Speaker Detection
Assunção, Gustavo
artificial neural networks
multi-modal perception
human–robot interaction
title_short Bio-Inspired Modality Fusion for Active Speaker Detection
title_full Bio-Inspired Modality Fusion for Active Speaker Detection
title_fullStr Bio-Inspired Modality Fusion for Active Speaker Detection
title_full_unstemmed Bio-Inspired Modality Fusion for Active Speaker Detection
title_sort Bio-Inspired Modality Fusion for Active Speaker Detection
author Assunção, Gustavo
author_facet Assunção, Gustavo
Gonçalves, Nuno
Menezes, Paulo
author_role author
author2 Gonçalves, Nuno
Menezes, Paulo
author2_role author
author
dc.contributor.author.fl_str_mv Assunção, Gustavo
Gonçalves, Nuno
Menezes, Paulo
dc.subject.por.fl_str_mv artificial neural networks
multi-modal perception
human–robot interaction
topic artificial neural networks
multi-modal perception
human–robot interaction
description Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.
publishDate 2020
dc.date.none.fl_str_mv 2020-02-28
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10316/103726
http://hdl.handle.net/10316/103726
https://doi.org/10.3390/app11083397
url http://hdl.handle.net/10316/103726
https://doi.org/10.3390/app11083397
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2076-3417
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv MDPI AG
publisher.none.fl_str_mv MDPI AG
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134097744855040