Automated Organisation and Quality Analysis of User-Generated Audio Content

Detalhes bibliográficos
Autor(a) principal: Mordido, Gonçalo Filipe Torcato
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/27752
Resumo: The abundance and ubiquity of user-generated content has opened horizons when it comes to the organization and analysis of vast and heterogeneous data, especially with the increase of quality of the recording devices witnessed nowadays. Most of the activity experienced in social networks today contains audio excerpts, either by belonging to a certain video file or an actual audio clip, therefore the analysis of the audio features present in such content is of extreme importance in order to better understand it. Such understanding would lead to a better handling of ubiquity data and would ultimately provide a better experience to the end-user. The work discussed in this thesis revolves around using audio features to organize and retrieve meaningful insights from user-generated content crawled from social media websites, more particularly data related to concert clips. From its redundancy and abundance (i.e., for the existence of several recordings of a given event), recordings from musical shows represent a very good use case to derive useful and practical conclusions around the scope of this thesis. Mechanisms that provide a better understanding of such content are presented and already partly implemented, such as audio clustering based on the existence of overlapping audio segments between different audio clips, audio segmentation that synchronizes and relates the different cluster’s clips in time, and techniques to infer audio quality of such clips. All the proposed methods use information retrieved from an audio fingerprinting algorithm, used for the synchronization of the different audio files, with methods for filtering possible false positives of the algorithm being also presented. For the evaluation and validation of the proposed methods, we used one dataset made of several audio recordings regarding different concert clips manually crawled from YouTube.
id RCAP_bc87042d8f364b764ee0cee10516f1e3
oai_identifier_str oai:run.unl.pt:10362/27752
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Automated Organisation and Quality Analysis of User-Generated Audio ContentUser-generated content,Audio fingerprintingAudio clusteringAudio segmentationAudio qualitySupervised learningDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaThe abundance and ubiquity of user-generated content has opened horizons when it comes to the organization and analysis of vast and heterogeneous data, especially with the increase of quality of the recording devices witnessed nowadays. Most of the activity experienced in social networks today contains audio excerpts, either by belonging to a certain video file or an actual audio clip, therefore the analysis of the audio features present in such content is of extreme importance in order to better understand it. Such understanding would lead to a better handling of ubiquity data and would ultimately provide a better experience to the end-user. The work discussed in this thesis revolves around using audio features to organize and retrieve meaningful insights from user-generated content crawled from social media websites, more particularly data related to concert clips. From its redundancy and abundance (i.e., for the existence of several recordings of a given event), recordings from musical shows represent a very good use case to derive useful and practical conclusions around the scope of this thesis. Mechanisms that provide a better understanding of such content are presented and already partly implemented, such as audio clustering based on the existence of overlapping audio segments between different audio clips, audio segmentation that synchronizes and relates the different cluster’s clips in time, and techniques to infer audio quality of such clips. All the proposed methods use information retrieved from an audio fingerprinting algorithm, used for the synchronization of the different audio files, with methods for filtering possible false positives of the algorithm being also presented. For the evaluation and validation of the proposed methods, we used one dataset made of several audio recordings regarding different concert clips manually crawled from YouTube.Cavaco, SofiaMagalhães, JoãoRUNMordido, Gonçalo Filipe Torcato2018-01-05T10:43:10Z2017-112017-112017-11-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/27752enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:14:37Zoai:run.unl.pt:10362/27752Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:28:43.841616Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Automated Organisation and Quality Analysis of User-Generated Audio Content
title Automated Organisation and Quality Analysis of User-Generated Audio Content
spellingShingle Automated Organisation and Quality Analysis of User-Generated Audio Content
Mordido, Gonçalo Filipe Torcato
User-generated content,
Audio fingerprinting
Audio clustering
Audio segmentation
Audio quality
Supervised learning
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
title_short Automated Organisation and Quality Analysis of User-Generated Audio Content
title_full Automated Organisation and Quality Analysis of User-Generated Audio Content
title_fullStr Automated Organisation and Quality Analysis of User-Generated Audio Content
title_full_unstemmed Automated Organisation and Quality Analysis of User-Generated Audio Content
title_sort Automated Organisation and Quality Analysis of User-Generated Audio Content
author Mordido, Gonçalo Filipe Torcato
author_facet Mordido, Gonçalo Filipe Torcato
author_role author
dc.contributor.none.fl_str_mv Cavaco, Sofia
Magalhães, João
RUN
dc.contributor.author.fl_str_mv Mordido, Gonçalo Filipe Torcato
dc.subject.por.fl_str_mv User-generated content,
Audio fingerprinting
Audio clustering
Audio segmentation
Audio quality
Supervised learning
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
topic User-generated content,
Audio fingerprinting
Audio clustering
Audio segmentation
Audio quality
Supervised learning
Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
description The abundance and ubiquity of user-generated content has opened horizons when it comes to the organization and analysis of vast and heterogeneous data, especially with the increase of quality of the recording devices witnessed nowadays. Most of the activity experienced in social networks today contains audio excerpts, either by belonging to a certain video file or an actual audio clip, therefore the analysis of the audio features present in such content is of extreme importance in order to better understand it. Such understanding would lead to a better handling of ubiquity data and would ultimately provide a better experience to the end-user. The work discussed in this thesis revolves around using audio features to organize and retrieve meaningful insights from user-generated content crawled from social media websites, more particularly data related to concert clips. From its redundancy and abundance (i.e., for the existence of several recordings of a given event), recordings from musical shows represent a very good use case to derive useful and practical conclusions around the scope of this thesis. Mechanisms that provide a better understanding of such content are presented and already partly implemented, such as audio clustering based on the existence of overlapping audio segments between different audio clips, audio segmentation that synchronizes and relates the different cluster’s clips in time, and techniques to infer audio quality of such clips. All the proposed methods use information retrieved from an audio fingerprinting algorithm, used for the synchronization of the different audio files, with methods for filtering possible false positives of the algorithm being also presented. For the evaluation and validation of the proposed methods, we used one dataset made of several audio recordings regarding different concert clips manually crawled from YouTube.
publishDate 2017
dc.date.none.fl_str_mv 2017-11
2017-11
2017-11-01T00:00:00Z
2018-01-05T10:43:10Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/27752
url http://hdl.handle.net/10362/27752
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137912765284352