Appearance-based global localization with a hybrid weightless-weighted neural network approach

Detalhes bibliográficos
Autor(a) principal: Silva, Avelino Forechi
Data de Publicação: 2018
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
Texto Completo: http://repositorio.ufes.br/handle/10/9876
Resumo: Currently, self-driving cars rely greatly on the Global Positioning System (GPS) infrastructure, albeit there is an increasing demand for global localization alternative methods in GPS-denied environments. One of them is known as appearance-based global localization, which associates images of places with their corresponding position. This is very appealing regarding the great number of geotagged photos publicly available and the ubiquitous devices fitted with ultra-high-resolution cameras, motion sensors and multicore processors nowadays. The appearance-based global localization can be devised in topological or metric solution regarding whether it is modelled as a classification or regression problem, respectively. The topological common approaches to solve the global localization problem often involve solutions in the spatial dimension and less frequent in the temporal dimension, but not both simultaneously. It was proposed an integrated spatio-temporal solution based on an ensemble of kNN classifiers, where each classifier uses the Dynamic Time Warping (DTW) and the Hamming distance to compare binary features extracted from sequences of images. Each base learner is fed with its own binary set of features extracted from images. The solution was designed to solve the global localization problem in two phases: mapping and localization. During mapping, it is trained with a sequence of images and associated locations that represents episodes experienced by a robot. During localization, it receives subsequences of images of the same environment and compares them to its previous experienced episodes, trying to recollect the most similar “experience” in time and space at once. Then, the system outputs the positions where it “believes” these images were captured. Although the method is fast to train, it scales linearly with the number of training samples in order to compute the Hamming distance and compare it against the test samples. Often, while building a map, one collects high correlated and redundant data around the environment of interest. Some reasons are due to the use of high frequency sensors or to the case of repeating trajectories. This extra data would carry an undesired burden on memory and runtime performance during test if not treated appropriately during the mapping phase. To tackle this problem, it is employed a clustering algorithm to compress the network’s memory after mapping. For large scale environments, it is combined the clustering algorithms with a multi hashing data structure seeking the best compromise between classification accuracy, runtime performance and memory usage. So far, this encompasses solely the topological solution part for the global localization problem, which is not precise enough for autonomous cars operation. Instead of just recognizing places and outputting an associated pose, it is desired that a global localization system regresses a pose given a current image of a place. But, inferring poses for city-scale scenes is unfeasible at least for decimetric precision. The proposed approach to tackle this problem is as follows: first take a live image from the camera and use the localization system aforementioned to return the image-pose pair most similar to a topological database built as before in the mapping phase. And then, given the live and mapped images, a visual localization system outputs the relative pose between those images. To solve the relative camera pose estimation problem, it is trained a Convolutional Neural Network (CNN) to take as input two separated images in time and space in order to output a 6 Degree of Freedom (DoF) pose vector, representing the relative position and orientation between the input images. In conjunction, both systems solve the global localization problem using topological and metric information to approximate the actual robot pose. The proposed hybrid weightless-weighted neural network approach is naturally combined in a way that the output of one system is the input to the other producing competitive results for the Global Localization task. The full approach is compared against a Real Time Kinematic GPS system and a Visual Simultaneous Localization and Mapping (SLAM) system. Experimental results show that the proposed combined approach is able to correctly global localize an autonomous vehicle 90% of the time with a mean error of 1.20m compared to 1.12m of the Visual SLAM system and 0.37m of the GPS, 89% of the time.
id UFES_cce2991bee3b6eff7325f5eadb74d555
oai_identifier_str oai:repositorio.ufes.br:10/9876
network_acronym_str UFES
network_name_str Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
repository_id_str 2108
spelling Santos, Thiago Oliveira dosSouza, Alberto Ferreira deSilva, Avelino ForechiOliveira, Elias Silva deGonçalves, Claudine Santos BadueAguiar, Edilson deCiarelli, Patrick Marques2018-08-02T00:04:08Z2018-08-012018-08-02T00:04:08Z2018-02-02Currently, self-driving cars rely greatly on the Global Positioning System (GPS) infrastructure, albeit there is an increasing demand for global localization alternative methods in GPS-denied environments. One of them is known as appearance-based global localization, which associates images of places with their corresponding position. This is very appealing regarding the great number of geotagged photos publicly available and the ubiquitous devices fitted with ultra-high-resolution cameras, motion sensors and multicore processors nowadays. The appearance-based global localization can be devised in topological or metric solution regarding whether it is modelled as a classification or regression problem, respectively. The topological common approaches to solve the global localization problem often involve solutions in the spatial dimension and less frequent in the temporal dimension, but not both simultaneously. It was proposed an integrated spatio-temporal solution based on an ensemble of kNN classifiers, where each classifier uses the Dynamic Time Warping (DTW) and the Hamming distance to compare binary features extracted from sequences of images. Each base learner is fed with its own binary set of features extracted from images. The solution was designed to solve the global localization problem in two phases: mapping and localization. During mapping, it is trained with a sequence of images and associated locations that represents episodes experienced by a robot. During localization, it receives subsequences of images of the same environment and compares them to its previous experienced episodes, trying to recollect the most similar “experience” in time and space at once. Then, the system outputs the positions where it “believes” these images were captured. Although the method is fast to train, it scales linearly with the number of training samples in order to compute the Hamming distance and compare it against the test samples. Often, while building a map, one collects high correlated and redundant data around the environment of interest. Some reasons are due to the use of high frequency sensors or to the case of repeating trajectories. This extra data would carry an undesired burden on memory and runtime performance during test if not treated appropriately during the mapping phase. To tackle this problem, it is employed a clustering algorithm to compress the network’s memory after mapping. For large scale environments, it is combined the clustering algorithms with a multi hashing data structure seeking the best compromise between classification accuracy, runtime performance and memory usage. So far, this encompasses solely the topological solution part for the global localization problem, which is not precise enough for autonomous cars operation. Instead of just recognizing places and outputting an associated pose, it is desired that a global localization system regresses a pose given a current image of a place. But, inferring poses for city-scale scenes is unfeasible at least for decimetric precision. The proposed approach to tackle this problem is as follows: first take a live image from the camera and use the localization system aforementioned to return the image-pose pair most similar to a topological database built as before in the mapping phase. And then, given the live and mapped images, a visual localization system outputs the relative pose between those images. To solve the relative camera pose estimation problem, it is trained a Convolutional Neural Network (CNN) to take as input two separated images in time and space in order to output a 6 Degree of Freedom (DoF) pose vector, representing the relative position and orientation between the input images. In conjunction, both systems solve the global localization problem using topological and metric information to approximate the actual robot pose. The proposed hybrid weightless-weighted neural network approach is naturally combined in a way that the output of one system is the input to the other producing competitive results for the Global Localization task. The full approach is compared against a Real Time Kinematic GPS system and a Visual Simultaneous Localization and Mapping (SLAM) system. Experimental results show that the proposed combined approach is able to correctly global localize an autonomous vehicle 90% of the time with a mean error of 1.20m compared to 1.12m of the Visual SLAM system and 0.37m of the GPS, 89% of the time.Atualmente, veículos autônomos dependem muito da infra-estrutura do Sistema de Posicionamento Global (GPS, da sigla em inglês), embora haja uma demanda crescente de métodos alternativos de localização global em ambientes com ausência de sinal de GPS. Um deles é conhecido como localização global baseada em aparência, que associa imagens de lugares com sua posição correspondente. Isso é muito atraente com relação à grande quantidade de fotos disponíveis publicamente com metadados geográficos e também se considerados os dispositivos móveis equipados com câmeras de altíssima resolução, sensores de movimento e processadores multi-núcleos disponíveis atualmente. A localização global baseada em aparência pode ser concebida como sendo uma solução topológica ou métrica quanto ao fato de ser modelada como um problema de classificação ou regressão, respectivamente. As abordagens topológicas comumente utilizadas para resolver o problema de localização global envolvem soluções na dimensão espacial e menos freqüentemente na dimensão temporal, mas não simultaneamente. Foi proposta uma solução espaço-temporal integrada baseada em um conjunto de classificadores kNN, onde cada classificador usa Dynamic Time Warping (DTW) e a distância de Hamming para comparar Features binárias extraídas de seqüências de imagens. Cada classificador é treinado com seu próprio conjunto binário de Features extraídas das imagens. A solução foi projetada para resolver o problema de localização global em duas fases: mapeamento e localização. Durante o mapeamento, o classificador é treinado com uma seqüência de imagens e locais associados que representam episódios experimentados por um robô. Durante a localização, ele recebe subseqüências de imagens do mesmo ambiente e as compara com os episódios experimentados anteriormente, tentando relembrar qual foi a “experiência” mais semelhante considerando tempo e espaço simultaneamente. Então, o sistema exibe as posições onde “acredita” que essas imagens foram capturadas. Embora o método seja rápido para treinar, ele escala linearmente com o número de amostras de treinamento, ao calcular a distância de Hamming e compará-la com as amostras de teste. Muitas vezes, ao construir um mapa, ocorre dos dados coletados serem altamente correlacionados e redundantes em torno do ambiente de interesse. Algumas razões se devem ao uso de sensores com alta freqüência de amostragem ou o caso de trajetórias repetidas. Esses dados extras podem ocasionar uma sobrecarga indesejada sobre a memória e o desempenho em tempo de execução durante o teste, se não for tratado adequadamente durante a fase de mapeamento. Para enfrentar este problema, foi empregado um algoritmo de agrupamento (Clustering) para comprimir a memória da rede após o mapeamento. Para ambientes de maior escala, combinamos os algoritmos de agrupamento com uma estrutura de dados com múltiplas tabelas de espalhamento (Hash Tables) buscando o melhor equilíbrio entre a precisão da classificação, o desempenho em tempo de execução e o uso de memória. Até aqui, o que foi discutido abrange apenas a parte de solução topológica para o problema de localização global, que não é suficientemente precisa para a operação de carros autônomos. Em vez de apenas reconhecer locais e produzir uma pose associada, é desejado que um sistema de localização global calcule uma pose dada uma imagem atual de um lugar. Mas inferir poses para cenas numa escala de cidade é uma tarefa muitas vezes inviável, pelo menos, para precisão decimétrica. A abordagem proposta para tentar resolver este problema é a seguinte: primeiro capture uma imagem ao vivo da câmera e use o sistema de localização acima mencionado para retornar o par de pose e imagem mais semelhante a um banco de dados topológico construído como antes na fase de mapeamento. E então, dadas as imagens ao vivo e mapeadas, um sistema de localização visual calcula a pose relativa entre essas imagens. Para resolver o problema de estimativa de pose relativa entre câmeras, é treinada uma Rede Neural Convolucional (CNN, da sigla em inglês) seguindo o projeto de uma arquitetura Siamesa para tomar como entrada duas imagens separadas no tempo e espaço e então produzir um vetor de pose com 6 graus de liberdade (6-DoF, da sigla em inglês), representando a posição relativa e orientação entre as imagens de entrada. Em conjunto, ambos os sistemas solucionam o problema de localização global usando informações topológicas e métricas para aproximar a pose real do robô. A abordagem proposta de se combinar rede neurais híbridas, com e sem peso, é uma forma natural de unificar as duas abordagens. De forma que a saída de um sistema seja a entrada para o outro e produza resultados competitivos para a tarefa de localização global. A abordagem completa é comparada então com um GPS cinemático de tempo real (RTK, da sigla em inglês) e um sistema visual de localização e mapeamento simultânemos (SLAM, da sigla em inglês). Os resultados experimentais mostram que a abordagem proposta completa é capaz de localizar globalmente um veículo autônomo em 90% do tempo com um erro médio de 1,20m em comparação com 1,12m alcançado pelo sistema de SLAM visual e 0,37m do GPS-RTK em 89% do tempo.TextSILVA, Avelino Forechi. Appearance-based global localization with a hybrid weightless-weighted neural network approach. 2018. 104 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2018.http://repositorio.ufes.br/handle/10/9876engUniversidade Federal do Espírito SantoDoutorado em Ciência da ComputaçãoPrograma de Pós-Graduação em InformáticaUFESBRCentro TecnológicoConvolutional neural networksWeightless neural networksAutonomous vehicle navigationDeep learningRedes neurais convolucionaisRedes neurais sem pesoCarros autônomosRedes neurais (Computação)Aprendizado do computadorVeículos autônomosVisão por computadorRobóticaCiência da Computação004Appearance-based global localization with a hybrid weightless-weighted neural network approachinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)instname:Universidade Federal do Espírito Santo (UFES)instacron:UFESORIGINALthesis_avelino.pdfapplication/pdf4688367http://repositorio.ufes.br/bitstreams/c0e4c9d9-1523-4b71-9abc-6d61fbddbaa0/downloadd2752781eb44e824d19a829821d04907MD5110/98762024-06-28 16:10:12.443oai:repositorio.ufes.br:10/9876http://repositorio.ufes.brRepositório InstitucionalPUBhttp://repositorio.ufes.br/oai/requestopendoar:21082024-06-28T16:10:12Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)false
dc.title.none.fl_str_mv Appearance-based global localization with a hybrid weightless-weighted neural network approach
title Appearance-based global localization with a hybrid weightless-weighted neural network approach
spellingShingle Appearance-based global localization with a hybrid weightless-weighted neural network approach
Silva, Avelino Forechi
Convolutional neural networks
Weightless neural networks
Autonomous vehicle navigation
Deep learning
Redes neurais convolucionais
Redes neurais sem peso
Carros autônomos
Ciência da Computação
Redes neurais (Computação)
Aprendizado do computador
Veículos autônomos
Visão por computador
Robótica
004
title_short Appearance-based global localization with a hybrid weightless-weighted neural network approach
title_full Appearance-based global localization with a hybrid weightless-weighted neural network approach
title_fullStr Appearance-based global localization with a hybrid weightless-weighted neural network approach
title_full_unstemmed Appearance-based global localization with a hybrid weightless-weighted neural network approach
title_sort Appearance-based global localization with a hybrid weightless-weighted neural network approach
author Silva, Avelino Forechi
author_facet Silva, Avelino Forechi
author_role author
dc.contributor.advisor-co1.fl_str_mv Santos, Thiago Oliveira dos
dc.contributor.advisor1.fl_str_mv Souza, Alberto Ferreira de
dc.contributor.author.fl_str_mv Silva, Avelino Forechi
dc.contributor.referee1.fl_str_mv Oliveira, Elias Silva de
dc.contributor.referee2.fl_str_mv Gonçalves, Claudine Santos Badue
dc.contributor.referee3.fl_str_mv Aguiar, Edilson de
dc.contributor.referee4.fl_str_mv Ciarelli, Patrick Marques
contributor_str_mv Santos, Thiago Oliveira dos
Souza, Alberto Ferreira de
Oliveira, Elias Silva de
Gonçalves, Claudine Santos Badue
Aguiar, Edilson de
Ciarelli, Patrick Marques
dc.subject.eng.fl_str_mv Convolutional neural networks
Weightless neural networks
Autonomous vehicle navigation
topic Convolutional neural networks
Weightless neural networks
Autonomous vehicle navigation
Deep learning
Redes neurais convolucionais
Redes neurais sem peso
Carros autônomos
Ciência da Computação
Redes neurais (Computação)
Aprendizado do computador
Veículos autônomos
Visão por computador
Robótica
004
dc.subject.por.fl_str_mv Deep learning
Redes neurais convolucionais
Redes neurais sem peso
Carros autônomos
dc.subject.cnpq.fl_str_mv Ciência da Computação
dc.subject.br-rjbn.none.fl_str_mv Redes neurais (Computação)
Aprendizado do computador
Veículos autônomos
Visão por computador
Robótica
dc.subject.udc.none.fl_str_mv 004
description Currently, self-driving cars rely greatly on the Global Positioning System (GPS) infrastructure, albeit there is an increasing demand for global localization alternative methods in GPS-denied environments. One of them is known as appearance-based global localization, which associates images of places with their corresponding position. This is very appealing regarding the great number of geotagged photos publicly available and the ubiquitous devices fitted with ultra-high-resolution cameras, motion sensors and multicore processors nowadays. The appearance-based global localization can be devised in topological or metric solution regarding whether it is modelled as a classification or regression problem, respectively. The topological common approaches to solve the global localization problem often involve solutions in the spatial dimension and less frequent in the temporal dimension, but not both simultaneously. It was proposed an integrated spatio-temporal solution based on an ensemble of kNN classifiers, where each classifier uses the Dynamic Time Warping (DTW) and the Hamming distance to compare binary features extracted from sequences of images. Each base learner is fed with its own binary set of features extracted from images. The solution was designed to solve the global localization problem in two phases: mapping and localization. During mapping, it is trained with a sequence of images and associated locations that represents episodes experienced by a robot. During localization, it receives subsequences of images of the same environment and compares them to its previous experienced episodes, trying to recollect the most similar “experience” in time and space at once. Then, the system outputs the positions where it “believes” these images were captured. Although the method is fast to train, it scales linearly with the number of training samples in order to compute the Hamming distance and compare it against the test samples. Often, while building a map, one collects high correlated and redundant data around the environment of interest. Some reasons are due to the use of high frequency sensors or to the case of repeating trajectories. This extra data would carry an undesired burden on memory and runtime performance during test if not treated appropriately during the mapping phase. To tackle this problem, it is employed a clustering algorithm to compress the network’s memory after mapping. For large scale environments, it is combined the clustering algorithms with a multi hashing data structure seeking the best compromise between classification accuracy, runtime performance and memory usage. So far, this encompasses solely the topological solution part for the global localization problem, which is not precise enough for autonomous cars operation. Instead of just recognizing places and outputting an associated pose, it is desired that a global localization system regresses a pose given a current image of a place. But, inferring poses for city-scale scenes is unfeasible at least for decimetric precision. The proposed approach to tackle this problem is as follows: first take a live image from the camera and use the localization system aforementioned to return the image-pose pair most similar to a topological database built as before in the mapping phase. And then, given the live and mapped images, a visual localization system outputs the relative pose between those images. To solve the relative camera pose estimation problem, it is trained a Convolutional Neural Network (CNN) to take as input two separated images in time and space in order to output a 6 Degree of Freedom (DoF) pose vector, representing the relative position and orientation between the input images. In conjunction, both systems solve the global localization problem using topological and metric information to approximate the actual robot pose. The proposed hybrid weightless-weighted neural network approach is naturally combined in a way that the output of one system is the input to the other producing competitive results for the Global Localization task. The full approach is compared against a Real Time Kinematic GPS system and a Visual Simultaneous Localization and Mapping (SLAM) system. Experimental results show that the proposed combined approach is able to correctly global localize an autonomous vehicle 90% of the time with a mean error of 1.20m compared to 1.12m of the Visual SLAM system and 0.37m of the GPS, 89% of the time.
publishDate 2018
dc.date.accessioned.fl_str_mv 2018-08-02T00:04:08Z
dc.date.available.fl_str_mv 2018-08-01
2018-08-02T00:04:08Z
dc.date.issued.fl_str_mv 2018-02-02
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv SILVA, Avelino Forechi. Appearance-based global localization with a hybrid weightless-weighted neural network approach. 2018. 104 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2018.
dc.identifier.uri.fl_str_mv http://repositorio.ufes.br/handle/10/9876
identifier_str_mv SILVA, Avelino Forechi. Appearance-based global localization with a hybrid weightless-weighted neural network approach. 2018. 104 f. Tese (Doutorado em Ciência da Computação) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2018.
url http://repositorio.ufes.br/handle/10/9876
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv Text
dc.publisher.none.fl_str_mv Universidade Federal do Espírito Santo
Doutorado em Ciência da Computação
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Informática
dc.publisher.initials.fl_str_mv UFES
dc.publisher.country.fl_str_mv BR
dc.publisher.department.fl_str_mv Centro Tecnológico
publisher.none.fl_str_mv Universidade Federal do Espírito Santo
Doutorado em Ciência da Computação
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
instname:Universidade Federal do Espírito Santo (UFES)
instacron:UFES
instname_str Universidade Federal do Espírito Santo (UFES)
instacron_str UFES
institution UFES
reponame_str Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
collection Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
bitstream.url.fl_str_mv http://repositorio.ufes.br/bitstreams/c0e4c9d9-1523-4b71-9abc-6d61fbddbaa0/download
bitstream.checksum.fl_str_mv d2752781eb44e824d19a829821d04907
bitstream.checksumAlgorithm.fl_str_mv MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)
repository.mail.fl_str_mv
_version_ 1804309151509118976