Object detection for augmented reality applications

Santos, José Miguel Pinto

Object detection for augmented reality applications

Detalhes bibliográficos
Autor(a) principal:	Santos, José Miguel Pinto
Data de Publicação:	2022
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://hdl.handle.net/10773/35069
Resumo:	Object detection in digital image (2D) is a widely researched area due to its countless applications. The evolution of the performance of the algorithms developed and the growth of new approaches is due to the integration with machine learning, namely the use of artificial neural networks in deep learning. The most commonly used methods are R-CNN (Region-based Convolutional Neural Networks) plus it’s variants (Fast R-CNN and Faster R-CNN) and for live feed applications it is used YOLO (You Only Look Once). Although a vast number of researches are made in 2D object detection a common problem that needs more attention is the pose estimation of the bounding boxes returned in the process of detection and classification of objects. The problem of the absence of pose estimation in the camera relatively to the scene in which it is analyzed has an impact in the bounding box position, not presenting a perfect match with the object when it is not paralleled or aligned relatively to the optical camera plain. The importance of correcting the pose estimation is justified by allowing an overlap of text using augmented reality. This application has a lot of benefits when used for aiding technicians while troubleshooting some equipments or in learning how to do difficult tasks. Three solutions are explored in this dissertation to try to solve this problem. The first uses information from external sensors for the camera in a mobile device giving the algorithm the information of the mobile device’s position in order to make the needed correction. The second method no longer involves external sensors. Instead it needs previous knowledge of the usual dimension ratios for the bounding box for each class to correct said box until the ratio is close to the predicted values. The third method requires the previous knowledge of the local features for each object class in order to predict if the object is aligned or not to the predicted bounding box and make adjustments until the ratio provided by the local features is within a threshold. After the correction it is overlapped text using augmented reality.

Metadados do item

id	RCAP_020a11c1170d3366f56e0167d6c7216b
oai_identifier_str	oai:ria.ua.pt:10773/35069
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Object detection for augmented reality applicationsObject detectionMachine learningDeep learningYou only look onceAugmented realityObject detection in digital image (2D) is a widely researched area due to its countless applications. The evolution of the performance of the algorithms developed and the growth of new approaches is due to the integration with machine learning, namely the use of artificial neural networks in deep learning. The most commonly used methods are R-CNN (Region-based Convolutional Neural Networks) plus it’s variants (Fast R-CNN and Faster R-CNN) and for live feed applications it is used YOLO (You Only Look Once). Although a vast number of researches are made in 2D object detection a common problem that needs more attention is the pose estimation of the bounding boxes returned in the process of detection and classification of objects. The problem of the absence of pose estimation in the camera relatively to the scene in which it is analyzed has an impact in the bounding box position, not presenting a perfect match with the object when it is not paralleled or aligned relatively to the optical camera plain. The importance of correcting the pose estimation is justified by allowing an overlap of text using augmented reality. This application has a lot of benefits when used for aiding technicians while troubleshooting some equipments or in learning how to do difficult tasks. Three solutions are explored in this dissertation to try to solve this problem. The first uses information from external sensors for the camera in a mobile device giving the algorithm the information of the mobile device’s position in order to make the needed correction. The second method no longer involves external sensors. Instead it needs previous knowledge of the usual dimension ratios for the bounding box for each class to correct said box until the ratio is close to the predicted values. The third method requires the previous knowledge of the local features for each object class in order to predict if the object is aligned or not to the predicted bounding box and make adjustments until the ratio provided by the local features is within a threshold. After the correction it is overlapped text using augmented reality.O reconhecimento de objetos em imagem digital (2D) é uma área amplamente investigada devido as suas inúmeras aplicações. A evolução da fiabilidade dos algoritmos desenvolvidos e crescimento do número de novas abordagens deve-se muito à integração de aprendizagem automática, nomeadamente o uso de redes neuronais artificiais em aprendizagem profunda. Os métodos mais usados são R-CNN (redes neurais convulsionais baseadas em regiões) e as suas variantes (Fast R-CNN e Faster R-CNN) e YOLO (apenas olha uma vez) para aplicações em que é necessário uma deteção mais rápida. Embora haja uma grande quantidade de investigações na deteção de objetos em 2D, um problema comum que carece de mais atenção é a estimativa da pose das caixas delimitadoras devolvidas no processo de deteção e reconhecimento de objetos. O problema da não existência de estimativa da pose da câmara relativamente à cena que se pretende analisar afeta a posição da caixa delimitadora, não havendo uma coincidência perfeita com o objeto quando este não está paralelo ou alinhado relativamente ao plano ótico da camara. A importância da correção da estimativa de pose é justificada por possibilitar a sobreposição de texto através de realidade aumentada. Esta aplicação tem muitos benefícios usando para auxílio de técnicos quando é necessário fazer resolução de problemas de algum equipamento ou na aprendizagem da realização de tarefas complexas. Foram exploradas três soluções nesta dissertação para tentar resolver este problema. A primeira usa informação de sensores externos à câmara, considerando neste cenário a utilização de um dispositivo móvel que fornece ao algoritmo a informação de posição do dispositivo, de modo a ser feita a necessária correção. O segundo método não usa sensores externos mas precisa de conhecimento prévio da proporção de dimensões esperadas para as caixas delimitadoras de cada classe de objetos de modo a corrigir a mesma até estar perto dos valores previsíveis. O terceiro método requer o conhecimento prévio de caraterísticas locais de cada classe de objetos de modo a fazer uma previsão se o objeto está alinhado ou não com a sua caixa delimitadora natural, e fazer ajustes até a proporção de dimensões esperadas fornecido pelas caraterísticas locais estejam dentro de valores previsíveis. Após a correção, é então sobreposto texto em através de realidade aumentada.2022-11-02T15:16:55Z2022-07-18T00:00:00Z2022-07-18info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10773/35069engSantos, José Miguel Pintoinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-02-22T12:07:31Zoai:ria.ua.pt:10773/35069Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:06:10.909768Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Object detection for augmented reality applications
title	Object detection for augmented reality applications
spellingShingle	Object detection for augmented reality applications Santos, José Miguel Pinto Object detection Machine learning Deep learning You only look once Augmented reality
title_short	Object detection for augmented reality applications
title_full	Object detection for augmented reality applications
title_fullStr	Object detection for augmented reality applications
title_full_unstemmed	Object detection for augmented reality applications
title_sort	Object detection for augmented reality applications
author	Santos, José Miguel Pinto
author_facet	Santos, José Miguel Pinto
author_role	author
dc.contributor.author.fl_str_mv	Santos, José Miguel Pinto
dc.subject.por.fl_str_mv	Object detection Machine learning Deep learning You only look once Augmented reality
topic	Object detection Machine learning Deep learning You only look once Augmented reality
description	Object detection in digital image (2D) is a widely researched area due to its countless applications. The evolution of the performance of the algorithms developed and the growth of new approaches is due to the integration with machine learning, namely the use of artificial neural networks in deep learning. The most commonly used methods are R-CNN (Region-based Convolutional Neural Networks) plus it’s variants (Fast R-CNN and Faster R-CNN) and for live feed applications it is used YOLO (You Only Look Once). Although a vast number of researches are made in 2D object detection a common problem that needs more attention is the pose estimation of the bounding boxes returned in the process of detection and classification of objects. The problem of the absence of pose estimation in the camera relatively to the scene in which it is analyzed has an impact in the bounding box position, not presenting a perfect match with the object when it is not paralleled or aligned relatively to the optical camera plain. The importance of correcting the pose estimation is justified by allowing an overlap of text using augmented reality. This application has a lot of benefits when used for aiding technicians while troubleshooting some equipments or in learning how to do difficult tasks. Three solutions are explored in this dissertation to try to solve this problem. The first uses information from external sensors for the camera in a mobile device giving the algorithm the information of the mobile device’s position in order to make the needed correction. The second method no longer involves external sensors. Instead it needs previous knowledge of the usual dimension ratios for the bounding box for each class to correct said box until the ratio is close to the predicted values. The third method requires the previous knowledge of the local features for each object class in order to predict if the object is aligned or not to the predicted bounding box and make adjustments until the ratio provided by the local features is within a threshold. After the correction it is overlapped text using augmented reality.
publishDate	2022
dc.date.none.fl_str_mv	2022-11-02T15:16:55Z 2022-07-18T00:00:00Z 2022-07-18
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10773/35069
url	http://hdl.handle.net/10773/35069
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799137716596637696

Object detection for augmented reality applications

Registros relacionados