Scene classification using a combination of aerial and ground images

Gabriel Lucas Silva Machado

Scene classification using a combination of aerial and ground images

Detalhes bibliográficos
Autor(a) principal:	Gabriel Lucas Silva Machado
Data de Publicação:	2021
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Institucional da UFMG
Texto Completo:	http://hdl.handle.net/1843/38082
Resumo:	It is undeniable that aerial and orbital images can provide useful information for a large variety of tasks, such as disaster relief and urban planing. But, since these images only see the Earth from one point of view, some applications can benefit from complementary information provided by other perspective views of the scene, such as ground-level images. Despite a large number of public image repositories for both georeferenced photographs and aerial images (such as Google Maps and Google Street View), there is a lack of public datasets that allow the development of approaches that exploit the benefits and complementarity of aerial/ground imagery. Because of that, in this dissertation, we present two new publicly available datasets named AiRound and CV-BrCT (Cross-View Brazilian Construction Types). The first one contains triplets of images from the same geographic coordinate with different perspectives, obtained at various places around the world. Each triplet is composed of an aerial RGB image, a ground-level perspective image, and a Sentinel-2 sample. The second dataset contains pairs of aerial and street-level images extracted from the southeast of Brazil. For this dissertation, we conducted a series of experiments involving both proposed datasets with the main objectives of (i) explore the complementary information that aerial and ground images have by using multi-modal machine learning models to enhance results, (ii) compare different feature fusion approaches applied in several state-of-the-art Convolutional Neural Network architectures, and (iii) investigate alternatives to handle missing data in a multi-modal scenario. Experiments show that, when compared to networks trained using only a single view, feature fusion algorithms achieved gains up to 0.15 and 0.20 in F1-Score for the AiRound and CV-BrCT datasets, respectively. Since it is not always possible to obtain the paired aerial/ground samples of a place, we also designed a framework to handle scenarios with missing samples. Comparing the results of a single-view network classification to the use of our framework integrated with a multi-view model, we achieved gains up to 0.03 in F1-Score for both datasets. Thus, our missing data completion framework has proven to be a more effective approach than just classifying images using a single-view model.

Metadados do item

id	UFMG_c0f2f3a10be39bd852423f09d2e04c2b
oai_identifier_str	oai:repositorio.ufmg.br:1843/38082
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	Jefersson Alex dos Santoshttp://lattes.cnpq.br/2171782600728348Keiller NogueiraKeiller NogueiraClodoveu Augussto Davis JúniorOtávio Augusto Bizetto Penattihttp://lattes.cnpq.br/7767025575268263Gabriel Lucas Silva Machado2021-09-19T23:36:25Z2021-09-19T23:36:25Z2021-03-31http://hdl.handle.net/1843/380820000-0002-7133-6324It is undeniable that aerial and orbital images can provide useful information for a large variety of tasks, such as disaster relief and urban planing. But, since these images only see the Earth from one point of view, some applications can benefit from complementary information provided by other perspective views of the scene, such as ground-level images. Despite a large number of public image repositories for both georeferenced photographs and aerial images (such as Google Maps and Google Street View), there is a lack of public datasets that allow the development of approaches that exploit the benefits and complementarity of aerial/ground imagery. Because of that, in this dissertation, we present two new publicly available datasets named AiRound and CV-BrCT (Cross-View Brazilian Construction Types). The first one contains triplets of images from the same geographic coordinate with different perspectives, obtained at various places around the world. Each triplet is composed of an aerial RGB image, a ground-level perspective image, and a Sentinel-2 sample. The second dataset contains pairs of aerial and street-level images extracted from the southeast of Brazil. For this dissertation, we conducted a series of experiments involving both proposed datasets with the main objectives of (i) explore the complementary information that aerial and ground images have by using multi-modal machine learning models to enhance results, (ii) compare different feature fusion approaches applied in several state-of-the-art Convolutional Neural Network architectures, and (iii) investigate alternatives to handle missing data in a multi-modal scenario. Experiments show that, when compared to networks trained using only a single view, feature fusion algorithms achieved gains up to 0.15 and 0.20 in F1-Score for the AiRound and CV-BrCT datasets, respectively. Since it is not always possible to obtain the paired aerial/ground samples of a place, we also designed a framework to handle scenarios with missing samples. Comparing the results of a single-view network classification to the use of our framework integrated with a multi-view model, we achieved gains up to 0.03 in F1-Score for both datasets. Thus, our missing data completion framework has proven to be a more effective approach than just classifying images using a single-view model.É inegável que imagens aéreas e orbitais fornecem uma grande variedade de informações para muitos tipos de aplicações, tais como logística humanitária para desastres naturais e planejamento urbano. Porém, devido ao fato dessas imagens sempre terem a mesma perspectiva, algumas aplicações podem ter grandes benefícios, caso sejam complementadas com fotos de outros ângulos, como por exemplo, imagens tomadas ao nível do solo. Apesar do grande número de repositórios de imagens públicos que permitem a aquisição de fotos e imagens aéreas georreferenciadas (tais como Google Maps e Google Street View), existe uma falta de datasets públicos com imagens pareadas de múltiplas visões. Devido a essa escassez, nesta dissertação foram produzidos dois novos datasets. O primeiro deles foi nomeado AiRound, e para cada amostra possui triplas de imagens de uma mesma coordenada geográfica. Cada tripla do AiRound contém uma imagem aérea, uma foto a nível do solo e uma imagem multi-espectral do satélite Sentinel-2. O segundo dataset foi nomeado CV-BrCT (Cross-View Brazilian Construction Types). Este é composto por pares de imagens (nível aéreo e nível do solo) coletados do Sudeste do Brasil. Para esta dissertação, conduzimos uma série de experimentos envolvendo ambos os datasets e visando os seguintes objetivos: (i) explorar a complementariedade de informação que imagens aéreas e a nível de solo possuem, usando modelos de aprendizado de máquina multimodais, (ii) comparar diferentes técnicas de fusão de características aplicadas em arquiteturas de redes neurais convolucionais, e (iii) investigar formas de lidar com atributos ausentes em um cenário multi-modal, no qual sempre existirá falta de dados em um determinado domínio. Experimentos demonstram que se comparados a modelos treinados/avaliados em um único domínio, algoritmos de fusão de informação atingem ganhos de até 0.15 e 0.20 no F1-Score para os datasets AiRound e CV-BrCT, respectivamente. Como nem sempre é possível obter imagens pareadas (em níveis aéreo e de solo) do mesmo local, projetamos um framework para lidar com cenários que utilizam algoritmos multimodais, e que nem sempre exigem pares de imagens para todas as amostras. Comparando resultados de classificações usando imagens de um único domínio com o uso do nosso framework integrado a um modelo multimodal, atingimos um ganho de 0.03 no F1-Score para ambos os datasets. Portanto, demonstramos que utilizar o nosso framework é mais eficaz do que apenas classificar usando dados e classificadores de um único domínio.CNPq - Conselho Nacional de Desenvolvimento Científico e TecnológicoFAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas GeraisCAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível SuperiorengUniversidade Federal de Minas GeraisPrograma de Pós-Graduação em Ciência da ComputaçãoUFMGBrasilICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃOComputação – Teses.Sensoriamento remoto – Teses.Classificação de imagens – Teses.Aprendizado de máquina – Teses.Remote sensingSensoriamento remotoImage classificationClassificação de imagensMultimodal machine learningAprendizado de máquinaScene classification using a combination of aerial and ground imagesCombinando múltiplas perspectivas para classificação de cenasinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMGORIGINALgabriel_dissertation_final.pdfgabriel_dissertation_final.pdfapplication/pdf20683421https://repositorio.ufmg.br/bitstream/1843/38082/3/gabriel_dissertation_final.pdfa89146fa8fac95a01673c75b4f2023b4MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82118https://repositorio.ufmg.br/bitstream/1843/38082/4/license.txtcda590c95a0b51b4d15f60c9642ca272MD541843/380822021-09-19 20:36:26.03oai:repositorio.ufmg.br:1843/38082TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEgRE8gUkVQT1NJVMOTUklPIElOU1RJVFVDSU9OQUwgREEgVUZNRwoKQ29tIGEgYXByZXNlbnRhw6fDo28gZGVzdGEgbGljZW7Dp2EsIHZvY8OqIChvIGF1dG9yIChlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSBhbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIChSSS1VRk1HKSBvIGRpcmVpdG8gbsOjbyBleGNsdXNpdm8gZSBpcnJldm9nw6F2ZWwgZGUgcmVwcm9kdXppciBlL291IGRpc3RyaWJ1aXIgYSBzdWEgcHVibGljYcOnw6NvIChpbmNsdWluZG8gbyByZXN1bW8pIHBvciB0b2RvIG8gbXVuZG8gbm8gZm9ybWF0byBpbXByZXNzbyBlIGVsZXRyw7RuaWNvIGUgZW0gcXVhbHF1ZXIgbWVpbywgaW5jbHVpbmRvIG9zIGZvcm1hdG9zIMOhdWRpbyBvdSB2w61kZW8uCgpWb2PDqiBkZWNsYXJhIHF1ZSBjb25oZWNlIGEgcG9sw610aWNhIGRlIGNvcHlyaWdodCBkYSBlZGl0b3JhIGRvIHNldSBkb2N1bWVudG8gZSBxdWUgY29uaGVjZSBlIGFjZWl0YSBhcyBEaXJldHJpemVzIGRvIFJJLVVGTUcuCgpWb2PDqiBjb25jb3JkYSBxdWUgbyBSZXBvc2l0w7NyaW8gSW5zdGl0dWNpb25hbCBkYSBVRk1HIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBvIFJlcG9zaXTDs3JpbyBJbnN0aXR1Y2lvbmFsIGRhIFVGTUcgcG9kZSBtYW50ZXIgbWFpcyBkZSB1bWEgY8OzcGlhIGRlIHN1YSBwdWJsaWNhw6fDo28gcGFyYSBmaW5zIGRlIHNlZ3VyYW7Dp2EsIGJhY2stdXAgZSBwcmVzZXJ2YcOnw6NvLgoKVm9jw6ogZGVjbGFyYSBxdWUgYSBzdWEgcHVibGljYcOnw6NvIMOpIG9yaWdpbmFsIGUgcXVlIHZvY8OqIHRlbSBvIHBvZGVyIGRlIGNvbmNlZGVyIG9zIGRpcmVpdG9zIGNvbnRpZG9zIG5lc3RhIGxpY2Vuw6dhLiBWb2PDqiB0YW1iw6ltIGRlY2xhcmEgcXVlIG8gZGVww7NzaXRvIGRlIHN1YSBwdWJsaWNhw6fDo28gbsOjbywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgaW5mcmluZ2UgZGlyZWl0b3MgYXV0b3JhaXMgZGUgbmluZ3XDqW0uCgpDYXNvIGEgc3VhIHB1YmxpY2HDp8OjbyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgYW8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHB1YmxpY2HDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBQVUJMSUNBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UgQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PLCBWT0PDiiBERUNMQVJBIFFVRSBSRVNQRUlUT1UgVE9ET1MgRSBRVUFJU1FVRVIgRElSRUlUT1MgREUgUkVWSVPDg08gQ09NTyBUQU1Cw4lNIEFTIERFTUFJUyBPQlJJR0HDh8OVRVMgRVhJR0lEQVMgUE9SIENPTlRSQVRPIE9VIEFDT1JETy4KCk8gUmVwb3NpdMOzcmlvIEluc3RpdHVjaW9uYWwgZGEgVUZNRyBzZSBjb21wcm9tZXRlIGEgaWRlbnRpZmljYXIgY2xhcmFtZW50ZSBvIHNldSBub21lKHMpIG91IG8ocykgbm9tZXMocykgZG8ocykgZGV0ZW50b3IoZXMpIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkYSBwdWJsaWNhw6fDo28sIGUgbsOjbyBmYXLDoSBxdWFscXVlciBhbHRlcmHDp8OjbywgYWzDqW0gZGFxdWVsYXMgY29uY2VkaWRhcyBwb3IgZXN0YSBsaWNlbsOnYS4KRepositório de PublicaçõesPUBhttps://repositorio.ufmg.br/oaiopendoar:2021-09-19T23:36:26Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.pt_BR.fl_str_mv	Scene classification using a combination of aerial and ground images
dc.title.alternative.pt_BR.fl_str_mv	Combinando múltiplas perspectivas para classificação de cenas
title	Scene classification using a combination of aerial and ground images
spellingShingle	Scene classification using a combination of aerial and ground images Gabriel Lucas Silva Machado Remote sensing Sensoriamento remoto Image classification Classificação de imagens Multimodal machine learning Aprendizado de máquina Computação – Teses. Sensoriamento remoto – Teses. Classificação de imagens – Teses. Aprendizado de máquina – Teses.
title_short	Scene classification using a combination of aerial and ground images
title_full	Scene classification using a combination of aerial and ground images
title_fullStr	Scene classification using a combination of aerial and ground images
title_full_unstemmed	Scene classification using a combination of aerial and ground images
title_sort	Scene classification using a combination of aerial and ground images
author	Gabriel Lucas Silva Machado
author_facet	Gabriel Lucas Silva Machado
author_role	author
dc.contributor.advisor1.fl_str_mv	Jefersson Alex dos Santos
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/2171782600728348
dc.contributor.advisor-co1.fl_str_mv	Keiller Nogueira
dc.contributor.referee1.fl_str_mv	Keiller Nogueira
dc.contributor.referee2.fl_str_mv	Clodoveu Augussto Davis Júnior
dc.contributor.referee3.fl_str_mv	Otávio Augusto Bizetto Penatti
dc.contributor.authorLattes.fl_str_mv	http://lattes.cnpq.br/7767025575268263
dc.contributor.author.fl_str_mv	Gabriel Lucas Silva Machado
contributor_str_mv	Jefersson Alex dos Santos Keiller Nogueira Keiller Nogueira Clodoveu Augussto Davis Júnior Otávio Augusto Bizetto Penatti
dc.subject.por.fl_str_mv	Remote sensing Sensoriamento remoto Image classification Classificação de imagens Multimodal machine learning Aprendizado de máquina
topic	Remote sensing Sensoriamento remoto Image classification Classificação de imagens Multimodal machine learning Aprendizado de máquina Computação – Teses. Sensoriamento remoto – Teses. Classificação de imagens – Teses. Aprendizado de máquina – Teses.
dc.subject.other.pt_BR.fl_str_mv	Computação – Teses. Sensoriamento remoto – Teses. Classificação de imagens – Teses. Aprendizado de máquina – Teses.
description	It is undeniable that aerial and orbital images can provide useful information for a large variety of tasks, such as disaster relief and urban planing. But, since these images only see the Earth from one point of view, some applications can benefit from complementary information provided by other perspective views of the scene, such as ground-level images. Despite a large number of public image repositories for both georeferenced photographs and aerial images (such as Google Maps and Google Street View), there is a lack of public datasets that allow the development of approaches that exploit the benefits and complementarity of aerial/ground imagery. Because of that, in this dissertation, we present two new publicly available datasets named AiRound and CV-BrCT (Cross-View Brazilian Construction Types). The first one contains triplets of images from the same geographic coordinate with different perspectives, obtained at various places around the world. Each triplet is composed of an aerial RGB image, a ground-level perspective image, and a Sentinel-2 sample. The second dataset contains pairs of aerial and street-level images extracted from the southeast of Brazil. For this dissertation, we conducted a series of experiments involving both proposed datasets with the main objectives of (i) explore the complementary information that aerial and ground images have by using multi-modal machine learning models to enhance results, (ii) compare different feature fusion approaches applied in several state-of-the-art Convolutional Neural Network architectures, and (iii) investigate alternatives to handle missing data in a multi-modal scenario. Experiments show that, when compared to networks trained using only a single view, feature fusion algorithms achieved gains up to 0.15 and 0.20 in F1-Score for the AiRound and CV-BrCT datasets, respectively. Since it is not always possible to obtain the paired aerial/ground samples of a place, we also designed a framework to handle scenarios with missing samples. Comparing the results of a single-view network classification to the use of our framework integrated with a multi-view model, we achieved gains up to 0.03 in F1-Score for both datasets. Thus, our missing data completion framework has proven to be a more effective approach than just classifying images using a single-view model.
publishDate	2021
dc.date.accessioned.fl_str_mv	2021-09-19T23:36:25Z
dc.date.available.fl_str_mv	2021-09-19T23:36:25Z
dc.date.issued.fl_str_mv	2021-03-31
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/1843/38082
dc.identifier.orcid.pt_BR.fl_str_mv	0000-0002-7133-6324
url	http://hdl.handle.net/1843/38082
identifier_str_mv	0000-0002-7133-6324
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv	UFMG
dc.publisher.country.fl_str_mv	Brasil
dc.publisher.department.fl_str_mv	ICX - DEPARTAMENTO DE CIÊNCIA DA COMPUTAÇÃO
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
bitstream.url.fl_str_mv	https://repositorio.ufmg.br/bitstream/1843/38082/3/gabriel_dissertation_final.pdf https://repositorio.ufmg.br/bitstream/1843/38082/4/license.txt
bitstream.checksum.fl_str_mv	a89146fa8fac95a01673c75b4f2023b4 cda590c95a0b51b4d15f60c9642ca272
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv
_version_	1803589424797188096

Scene classification using a combination of aerial and ground images

Registros relacionados