Object detection for augmented reality applications

Detalhes bibliográficos
Autor(a) principal: Santos, José Miguel Pinto
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10773/35069
Resumo: Object detection in digital image (2D) is a widely researched area due to its countless applications. The evolution of the performance of the algorithms developed and the growth of new approaches is due to the integration with machine learning, namely the use of artificial neural networks in deep learning. The most commonly used methods are R-CNN (Region-based Convolutional Neural Networks) plus it’s variants (Fast R-CNN and Faster R-CNN) and for live feed applications it is used YOLO (You Only Look Once). Although a vast number of researches are made in 2D object detection a common problem that needs more attention is the pose estimation of the bounding boxes returned in the process of detection and classification of objects. The problem of the absence of pose estimation in the camera relatively to the scene in which it is analyzed has an impact in the bounding box position, not presenting a perfect match with the object when it is not paralleled or aligned relatively to the optical camera plain. The importance of correcting the pose estimation is justified by allowing an overlap of text using augmented reality. This application has a lot of benefits when used for aiding technicians while troubleshooting some equipments or in learning how to do difficult tasks. Three solutions are explored in this dissertation to try to solve this problem. The first uses information from external sensors for the camera in a mobile device giving the algorithm the information of the mobile device’s position in order to make the needed correction. The second method no longer involves external sensors. Instead it needs previous knowledge of the usual dimension ratios for the bounding box for each class to correct said box until the ratio is close to the predicted values. The third method requires the previous knowledge of the local features for each object class in order to predict if the object is aligned or not to the predicted bounding box and make adjustments until the ratio provided by the local features is within a threshold. After the correction it is overlapped text using augmented reality.
id RCAP_020a11c1170d3366f56e0167d6c7216b
oai_identifier_str oai:ria.ua.pt:10773/35069
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Object detection for augmented reality applicationsObject detectionMachine learningDeep learningYou only look onceAugmented realityObject detection in digital image (2D) is a widely researched area due to its countless applications. The evolution of the performance of the algorithms developed and the growth of new approaches is due to the integration with machine learning, namely the use of artificial neural networks in deep learning. The most commonly used methods are R-CNN (Region-based Convolutional Neural Networks) plus it’s variants (Fast R-CNN and Faster R-CNN) and for live feed applications it is used YOLO (You Only Look Once). Although a vast number of researches are made in 2D object detection a common problem that needs more attention is the pose estimation of the bounding boxes returned in the process of detection and classification of objects. The problem of the absence of pose estimation in the camera relatively to the scene in which it is analyzed has an impact in the bounding box position, not presenting a perfect match with the object when it is not paralleled or aligned relatively to the optical camera plain. The importance of correcting the pose estimation is justified by allowing an overlap of text using augmented reality. This application has a lot of benefits when used for aiding technicians while troubleshooting some equipments or in learning how to do difficult tasks. Three solutions are explored in this dissertation to try to solve this problem. The first uses information from external sensors for the camera in a mobile device giving the algorithm the information of the mobile device’s position in order to make the needed correction. The second method no longer involves external sensors. Instead it needs previous knowledge of the usual dimension ratios for the bounding box for each class to correct said box until the ratio is close to the predicted values. The third method requires the previous knowledge of the local features for each object class in order to predict if the object is aligned or not to the predicted bounding box and make adjustments until the ratio provided by the local features is within a threshold. After the correction it is overlapped text using augmented reality.O reconhecimento de objetos em imagem digital (2D) é uma área amplamente investigada devido as suas inúmeras aplicações. A evolução da fiabilidade dos algoritmos desenvolvidos e crescimento do número de novas abordagens deve-se muito à integração de aprendizagem automática, nomeadamente o uso de redes neuronais artificiais em aprendizagem profunda. Os métodos mais usados são R-CNN (redes neurais convulsionais baseadas em regiões) e as suas variantes (Fast R-CNN e Faster R-CNN) e YOLO (apenas olha uma vez) para aplicações em que é necessário uma deteção mais rápida. Embora haja uma grande quantidade de investigações na deteção de objetos em 2D, um problema comum que carece de mais atenção é a estimativa da pose das caixas delimitadoras devolvidas no processo de deteção e reconhecimento de objetos. O problema da não existência de estimativa da pose da câmara relativamente à cena que se pretende analisar afeta a posição da caixa delimitadora, não havendo uma coincidência perfeita com o objeto quando este não está paralelo ou alinhado relativamente ao plano ótico da camara. A importância da correção da estimativa de pose é justificada por possibilitar a sobreposição de texto através de realidade aumentada. Esta aplicação tem muitos benefícios usando para auxílio de técnicos quando é necessário fazer resolução de problemas de algum equipamento ou na aprendizagem da realização de tarefas complexas. Foram exploradas três soluções nesta dissertação para tentar resolver este problema. A primeira usa informação de sensores externos à câmara, considerando neste cenário a utilização de um dispositivo móvel que fornece ao algoritmo a informação de posição do dispositivo, de modo a ser feita a necessária correção. O segundo método não usa sensores externos mas precisa de conhecimento prévio da proporção de dimensões esperadas para as caixas delimitadoras de cada classe de objetos de modo a corrigir a mesma até estar perto dos valores previsíveis. O terceiro método requer o conhecimento prévio de caraterísticas locais de cada classe de objetos de modo a fazer uma previsão se o objeto está alinhado ou não com a sua caixa delimitadora natural, e fazer ajustes até a proporção de dimensões esperadas fornecido pelas caraterísticas locais estejam dentro de valores previsíveis. Após a correção, é então sobreposto texto em através de realidade aumentada.2022-11-02T15:16:55Z2022-07-18T00:00:00Z2022-07-18info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/35069engSantos, José Miguel Pintoinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:07:31Zoai:ria.ua.pt:10773/35069Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:06:10.909768Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Object detection for augmented reality applications
title Object detection for augmented reality applications
spellingShingle Object detection for augmented reality applications
Santos, José Miguel Pinto
Object detection
Machine learning
Deep learning
You only look once
Augmented reality
title_short Object detection for augmented reality applications
title_full Object detection for augmented reality applications
title_fullStr Object detection for augmented reality applications
title_full_unstemmed Object detection for augmented reality applications
title_sort Object detection for augmented reality applications
author Santos, José Miguel Pinto
author_facet Santos, José Miguel Pinto
author_role author
dc.contributor.author.fl_str_mv Santos, José Miguel Pinto
dc.subject.por.fl_str_mv Object detection
Machine learning
Deep learning
You only look once
Augmented reality
topic Object detection
Machine learning
Deep learning
You only look once
Augmented reality
description Object detection in digital image (2D) is a widely researched area due to its countless applications. The evolution of the performance of the algorithms developed and the growth of new approaches is due to the integration with machine learning, namely the use of artificial neural networks in deep learning. The most commonly used methods are R-CNN (Region-based Convolutional Neural Networks) plus it’s variants (Fast R-CNN and Faster R-CNN) and for live feed applications it is used YOLO (You Only Look Once). Although a vast number of researches are made in 2D object detection a common problem that needs more attention is the pose estimation of the bounding boxes returned in the process of detection and classification of objects. The problem of the absence of pose estimation in the camera relatively to the scene in which it is analyzed has an impact in the bounding box position, not presenting a perfect match with the object when it is not paralleled or aligned relatively to the optical camera plain. The importance of correcting the pose estimation is justified by allowing an overlap of text using augmented reality. This application has a lot of benefits when used for aiding technicians while troubleshooting some equipments or in learning how to do difficult tasks. Three solutions are explored in this dissertation to try to solve this problem. The first uses information from external sensors for the camera in a mobile device giving the algorithm the information of the mobile device’s position in order to make the needed correction. The second method no longer involves external sensors. Instead it needs previous knowledge of the usual dimension ratios for the bounding box for each class to correct said box until the ratio is close to the predicted values. The third method requires the previous knowledge of the local features for each object class in order to predict if the object is aligned or not to the predicted bounding box and make adjustments until the ratio provided by the local features is within a threshold. After the correction it is overlapped text using augmented reality.
publishDate 2022
dc.date.none.fl_str_mv 2022-11-02T15:16:55Z
2022-07-18T00:00:00Z
2022-07-18
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/35069
url http://hdl.handle.net/10773/35069
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799137716596637696