Unsupervised multi-view multi-person 3D pose estimation
Autor(a) principal: | |
---|---|
Data de Publicação: | 2023 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFPE |
Texto Completo: | https://repositorio.ufpe.br/handle/123456789/53559 |
Resumo: | The problem of 3D pose estimation of multiple persons in a multi-view scenario has been an ongoing challenge in computer vision. Most current state-of-the-art methods for 3D pose estimation have relied on supervised techniques, which require a large amount of labelled data for training. However, generating accurate 3D annotations is costly, time-consuming, and prone to errors. Therefore, a novel approach that does not require labeled data for 3D pose estimation has been proposed. The proposed method, the Unsupervised Multi-View Multi- Person approach, uses a plane sweep method to generate 3D pose estimations. This approach defines one view as the target and the rest as reference views. First, the depth of each 2D skeleton in the target view is estimated to obtain the 3D poses. Then, instead of comparing the 3D poses with ground truth poses, the calculated 3D poses are projected onto the reference views. The 2D projections are then compared with the 2D poses obtained using an off-the- shelf method. Finally, the 2D poses of the same pedestrian obtained from the target and reference views are matched for comparison. The matching process is based on ground points to identify the corresponding 2D poses and compare them with the respective projections. To improve the accuracy of the proposed approach, a new reprojection loss based on the smooth L1 norm has been introduced. This loss function considers the errors in the estimated 3D poses and the projections onto the reference views. It has been tested on the publicly available Campus dataset to evaluate the effectiveness of the proposed approach. The results show that the proposed approach achieves better accuracy than state-of-the-art unsupervised methods, with a 0.5% points improvement over the best geometric system. Furthermore, the proposed method outperforms some state-of-the-art supervised methods and achieves comparable results with the best-managed approach, with only a 0.2% points difference. In conclusion, the Unsupervised Multi-View Multi-Person approach is a promising method for 3D pose estimation in multi-view scenarios. Its ability to generate accurate 3D pose estimations without relying on labeled data makes it valuable to computer vision. The evaluation results demonstrate the proposed approach’s effectiveness and potential for future research in this area. |
id |
UFPE_c3ec436802f259e2a1cea572574dee88 |
---|---|
oai_identifier_str |
oai:repositorio.ufpe.br:123456789/53559 |
network_acronym_str |
UFPE |
network_name_str |
Repositório Institucional da UFPE |
repository_id_str |
2221 |
spelling |
SILVA, Diógenes Wallis de Françahttp://lattes.cnpq.br/4194630294196439http://lattes.cnpq.br/3355338790654065http://lattes.cnpq.br/1916245590298485TEICHRIEB, VeronicaLIMA, João Paulo Silva do Monte2023-11-13T13:30:06Z2023-11-13T13:30:06Z2023-07-28SILVA, Diógenes Wallis de França. Unsupervised multi-view multi-person 3D pose estimation. 2023. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.https://repositorio.ufpe.br/handle/123456789/53559The problem of 3D pose estimation of multiple persons in a multi-view scenario has been an ongoing challenge in computer vision. Most current state-of-the-art methods for 3D pose estimation have relied on supervised techniques, which require a large amount of labelled data for training. However, generating accurate 3D annotations is costly, time-consuming, and prone to errors. Therefore, a novel approach that does not require labeled data for 3D pose estimation has been proposed. The proposed method, the Unsupervised Multi-View Multi- Person approach, uses a plane sweep method to generate 3D pose estimations. This approach defines one view as the target and the rest as reference views. First, the depth of each 2D skeleton in the target view is estimated to obtain the 3D poses. Then, instead of comparing the 3D poses with ground truth poses, the calculated 3D poses are projected onto the reference views. The 2D projections are then compared with the 2D poses obtained using an off-the- shelf method. Finally, the 2D poses of the same pedestrian obtained from the target and reference views are matched for comparison. The matching process is based on ground points to identify the corresponding 2D poses and compare them with the respective projections. To improve the accuracy of the proposed approach, a new reprojection loss based on the smooth L1 norm has been introduced. This loss function considers the errors in the estimated 3D poses and the projections onto the reference views. It has been tested on the publicly available Campus dataset to evaluate the effectiveness of the proposed approach. The results show that the proposed approach achieves better accuracy than state-of-the-art unsupervised methods, with a 0.5% points improvement over the best geometric system. Furthermore, the proposed method outperforms some state-of-the-art supervised methods and achieves comparable results with the best-managed approach, with only a 0.2% points difference. In conclusion, the Unsupervised Multi-View Multi-Person approach is a promising method for 3D pose estimation in multi-view scenarios. Its ability to generate accurate 3D pose estimations without relying on labeled data makes it valuable to computer vision. The evaluation results demonstrate the proposed approach’s effectiveness and potential for future research in this area.CAPESO problema da estimativa de pose 3D de múltiplas pessoas em cenários de múltiplas visualizações tem sido um desafio contínuo em visão computacional. A maioria dos métodos de estado da arte para estimativa de pose 3D atualmente depende de técnicas supervisionadas, que exigem uma grande quantidade de dados rotulados para o treinamento. No entanto, gerar anotações 3D precisas é caro, consome tempo e está sujeito a erros. Portanto, foi proposta uma abordagem nova que não requer dados rotulados para estimativa de pose 3D. A abordagem proposta não supervisionada que trata de múltiplas visualizações e múltiplas pessoas, utiliza um método de varredura de planos para gerar estimativas de pose 3D. Essa abordagem define uma visualização como alvo e as demais como visualizações de referência. Primeiramente, a profundidade de cada esqueleto 2D na visualização alvo é estimada para obter as poses 3D. Em seguida, em vez de comparar as poses 3D com as poses verdadeiras, as poses 3D calculadas são projetadas nas visualizações de referência. As projeções 2D são, então, comparadas com as poses 2D obtidas usando um método pronto para uso. Por fim, as poses 2D do mesmo pedestre obtidas a partir das visualizações alvo e de referência são comparadas para avaliação. O processo de comparação é baseado em pontos de referência para identificar as poses 2D correspondentes e compará-las com as respectivas projeções. Para melhorar a precisão da abordagem proposta, foi introduzida uma nova perda de reprojeção baseada na norma L1 suave. Essa função de perda considera os erros nas poses 3D estimadas e nas projeções nas visualizações de referência. Ela foi testada no conjunto de dados público Campus para avaliar a eficácia da abordagem proposta. Os resultados mostram que a abordagem proposta alcança maior precisão do que os métodos não supervisionados de estado da arte, com uma melhoria de 0,5 ponto percentual em relação ao melhor sistema geométrico. Além disso, o método proposto supera alguns métodos supervisionados de estado da arte e alcança resultados comparáveis com a melhor abordagem supervisionada, com apenas uma diferença de 0,2 ponto percentual. Em conclusão, a proposta abordagem não supervisionada em um cenário com múltiplas vistas e múltiplas pessoas é um método promissor para a estimativa de pose 3D. Sua capacidade de gerar estimativas de pose 3D precisas sem depender de dados rotulados a torna valiosa para a visão computacional.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessInteligência computacionalAprendizado profundoUnsupervised multi-view multi-person 3D pose estimationinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPEORIGINALDISSERTAÇÃO Diógenes Wallis de França Silva.pdfDISSERTAÇÃO Diógenes Wallis de França Silva.pdfapplication/pdf11326384https://repositorio.ufpe.br/bitstream/123456789/53559/1/DISSERTA%c3%87%c3%83O%20Di%c3%b3genes%20Wallis%20de%20Fran%c3%a7a%20Silva.pdf803075ebddbfdc79016478c06bdc6452MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/53559/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82362https://repositorio.ufpe.br/bitstream/123456789/53559/3/license.txt5e89a1613ddc8510c6576f4b23a78973MD53TEXTDISSERTAÇÃO Diógenes Wallis de França Silva.pdf.txtDISSERTAÇÃO Diógenes Wallis de França Silva.pdf.txtExtracted texttext/plain120296https://repositorio.ufpe.br/bitstream/123456789/53559/4/DISSERTA%c3%87%c3%83O%20Di%c3%b3genes%20Wallis%20de%20Fran%c3%a7a%20Silva.pdf.txta0a5ce8896356d2c70318e91e67a28a3MD54THUMBNAILDISSERTAÇÃO Diógenes Wallis de França Silva.pdf.jpgDISSERTAÇÃO Diógenes Wallis de França Silva.pdf.jpgGenerated Thumbnailimage/jpeg1216https://repositorio.ufpe.br/bitstream/123456789/53559/5/DISSERTA%c3%87%c3%83O%20Di%c3%b3genes%20Wallis%20de%20Fran%c3%a7a%20Silva.pdf.jpg3ad063ab1a5f6299cc578af6bb1e86f2MD55123456789/535592023-11-14 02:19:29.666oai:repositorio.ufpe.br:123456789/53559VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2l6YcOnw6NvIGRlIERvY3VtZW50b3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRQoKCkRlY2xhcm8gZXN0YXIgY2llbnRlIGRlIHF1ZSBlc3RlIFRlcm1vIGRlIERlcMOzc2l0byBMZWdhbCBlIEF1dG9yaXphw6fDo28gdGVtIG8gb2JqZXRpdm8gZGUgZGl2dWxnYcOnw6NvIGRvcyBkb2N1bWVudG9zIGRlcG9zaXRhZG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgZSBkZWNsYXJvIHF1ZToKCkkgLSBvcyBkYWRvcyBwcmVlbmNoaWRvcyBubyBmb3JtdWzDoXJpbyBkZSBkZXDDs3NpdG8gc8OjbyB2ZXJkYWRlaXJvcyBlIGF1dMOqbnRpY29zOwoKSUkgLSAgbyBjb250ZcO6ZG8gZGlzcG9uaWJpbGl6YWRvIMOpIGRlIHJlc3BvbnNhYmlsaWRhZGUgZGUgc3VhIGF1dG9yaWE7CgpJSUkgLSBvIGNvbnRlw7pkbyDDqSBvcmlnaW5hbCwgZSBzZSBvIHRyYWJhbGhvIGUvb3UgcGFsYXZyYXMgZGUgb3V0cmFzIHBlc3NvYXMgZm9yYW0gdXRpbGl6YWRvcywgZXN0YXMgZm9yYW0gZGV2aWRhbWVudGUgcmVjb25oZWNpZGFzOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIG9icmEgY29sZXRpdmEgKG1haXMgZGUgdW0gYXV0b3IpOiB0b2RvcyBvcyBhdXRvcmVzIGVzdMOjbyBjaWVudGVzIGRvIGRlcMOzc2l0byBlIGRlIGFjb3JkbyBjb20gZXN0ZSB0ZXJtbzsKClYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogbyBhcnF1aXZvIGRlcG9zaXRhZG8gY29ycmVzcG9uZGUgw6AgdmVyc8OjbyBmaW5hbCBkbyB0cmFiYWxobzsKClZJIC0gcXVhbmRvIHRyYXRhci1zZSBkZSBUcmFiYWxobyBkZSBDb25jbHVzw6NvIGRlIEN1cnNvLCBEaXNzZXJ0YcOnw6NvIG91IFRlc2U6IGVzdG91IGNpZW50ZSBkZSBxdWUgYSBhbHRlcmHDp8OjbyBkYSBtb2RhbGlkYWRlIGRlIGFjZXNzbyBhbyBkb2N1bWVudG8gYXDDs3MgbyBkZXDDs3NpdG8gZSBhbnRlcyBkZSBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvLCBxdWFuZG8gZm9yIGVzY29saGlkbyBhY2Vzc28gcmVzdHJpdG8sIHNlcsOhIHBlcm1pdGlkYSBtZWRpYW50ZSBzb2xpY2l0YcOnw6NvIGRvIChhKSBhdXRvciAoYSkgYW8gU2lzdGVtYSBJbnRlZ3JhZG8gZGUgQmlibGlvdGVjYXMgZGEgVUZQRSAoU0lCL1VGUEUpLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gQWJlcnRvOgoKTmEgcXVhbGlkYWRlIGRlIHRpdHVsYXIgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRlIGF1dG9yIHF1ZSByZWNhZW0gc29icmUgZXN0ZSBkb2N1bWVudG8sIGZ1bmRhbWVudGFkbyBuYSBMZWkgZGUgRGlyZWl0byBBdXRvcmFsIG5vIDkuNjEwLCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIGEgcGFydGlyIGRhIGRhdGEgZGUgZGVww7NzaXRvLgoKIApQYXJhIHRyYWJhbGhvcyBlbSBBY2Vzc28gUmVzdHJpdG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAgZGUgMTkgZGUgZmV2ZXJlaXJvIGRlIDE5OTgsIGFydC4gMjksIGluY2lzbyBJSUksIGF1dG9yaXpvIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgUGVybmFtYnVjbyBhIGRpc3BvbmliaWxpemFyIGdyYXR1aXRhbWVudGUsIHNlbSByZXNzYXJjaW1lbnRvIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgcGFyYSBmaW5zIGRlIGxlaXR1cmEsIGltcHJlc3PDo28gZS9vdSBkb3dubG9hZCAoYXF1aXNpw6fDo28pIGF0cmF2w6lzIGRvIHNpdGUgZG8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBubyBlbmRlcmXDp28gaHR0cDovL3d3dy5yZXBvc2l0b3Jpby51ZnBlLmJyLCBxdWFuZG8gZmluZGFyIG8gcGVyw61vZG8gZGUgZW1iYXJnbyBjb25kaXplbnRlIGFvIHRpcG8gZGUgZG9jdW1lbnRvLCBjb25mb3JtZSBpbmRpY2FkbyBubyBjYW1wbyBEYXRhIGRlIEVtYmFyZ28uCg==Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212023-11-14T05:19:29Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false |
dc.title.pt_BR.fl_str_mv |
Unsupervised multi-view multi-person 3D pose estimation |
title |
Unsupervised multi-view multi-person 3D pose estimation |
spellingShingle |
Unsupervised multi-view multi-person 3D pose estimation SILVA, Diógenes Wallis de França Inteligência computacional Aprendizado profundo |
title_short |
Unsupervised multi-view multi-person 3D pose estimation |
title_full |
Unsupervised multi-view multi-person 3D pose estimation |
title_fullStr |
Unsupervised multi-view multi-person 3D pose estimation |
title_full_unstemmed |
Unsupervised multi-view multi-person 3D pose estimation |
title_sort |
Unsupervised multi-view multi-person 3D pose estimation |
author |
SILVA, Diógenes Wallis de França |
author_facet |
SILVA, Diógenes Wallis de França |
author_role |
author |
dc.contributor.authorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/4194630294196439 |
dc.contributor.advisorLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/3355338790654065 |
dc.contributor.advisor-coLattes.pt_BR.fl_str_mv |
http://lattes.cnpq.br/1916245590298485 |
dc.contributor.author.fl_str_mv |
SILVA, Diógenes Wallis de França |
dc.contributor.advisor1.fl_str_mv |
TEICHRIEB, Veronica |
dc.contributor.advisor-co1.fl_str_mv |
LIMA, João Paulo Silva do Monte |
contributor_str_mv |
TEICHRIEB, Veronica LIMA, João Paulo Silva do Monte |
dc.subject.por.fl_str_mv |
Inteligência computacional Aprendizado profundo |
topic |
Inteligência computacional Aprendizado profundo |
description |
The problem of 3D pose estimation of multiple persons in a multi-view scenario has been an ongoing challenge in computer vision. Most current state-of-the-art methods for 3D pose estimation have relied on supervised techniques, which require a large amount of labelled data for training. However, generating accurate 3D annotations is costly, time-consuming, and prone to errors. Therefore, a novel approach that does not require labeled data for 3D pose estimation has been proposed. The proposed method, the Unsupervised Multi-View Multi- Person approach, uses a plane sweep method to generate 3D pose estimations. This approach defines one view as the target and the rest as reference views. First, the depth of each 2D skeleton in the target view is estimated to obtain the 3D poses. Then, instead of comparing the 3D poses with ground truth poses, the calculated 3D poses are projected onto the reference views. The 2D projections are then compared with the 2D poses obtained using an off-the- shelf method. Finally, the 2D poses of the same pedestrian obtained from the target and reference views are matched for comparison. The matching process is based on ground points to identify the corresponding 2D poses and compare them with the respective projections. To improve the accuracy of the proposed approach, a new reprojection loss based on the smooth L1 norm has been introduced. This loss function considers the errors in the estimated 3D poses and the projections onto the reference views. It has been tested on the publicly available Campus dataset to evaluate the effectiveness of the proposed approach. The results show that the proposed approach achieves better accuracy than state-of-the-art unsupervised methods, with a 0.5% points improvement over the best geometric system. Furthermore, the proposed method outperforms some state-of-the-art supervised methods and achieves comparable results with the best-managed approach, with only a 0.2% points difference. In conclusion, the Unsupervised Multi-View Multi-Person approach is a promising method for 3D pose estimation in multi-view scenarios. Its ability to generate accurate 3D pose estimations without relying on labeled data makes it valuable to computer vision. The evaluation results demonstrate the proposed approach’s effectiveness and potential for future research in this area. |
publishDate |
2023 |
dc.date.accessioned.fl_str_mv |
2023-11-13T13:30:06Z |
dc.date.available.fl_str_mv |
2023-11-13T13:30:06Z |
dc.date.issued.fl_str_mv |
2023-07-28 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
SILVA, Diógenes Wallis de França. Unsupervised multi-view multi-person 3D pose estimation. 2023. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023. |
dc.identifier.uri.fl_str_mv |
https://repositorio.ufpe.br/handle/123456789/53559 |
identifier_str_mv |
SILVA, Diógenes Wallis de França. Unsupervised multi-view multi-person 3D pose estimation. 2023. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023. |
url |
https://repositorio.ufpe.br/handle/123456789/53559 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivs 3.0 Brazil http://creativecommons.org/licenses/by-nc-nd/3.0/br/ |
eu_rights_str_mv |
openAccess |
dc.publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.publisher.program.fl_str_mv |
Programa de Pos Graduacao em Ciencia da Computacao |
dc.publisher.initials.fl_str_mv |
UFPE |
dc.publisher.country.fl_str_mv |
Brasil |
publisher.none.fl_str_mv |
Universidade Federal de Pernambuco |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFPE instname:Universidade Federal de Pernambuco (UFPE) instacron:UFPE |
instname_str |
Universidade Federal de Pernambuco (UFPE) |
instacron_str |
UFPE |
institution |
UFPE |
reponame_str |
Repositório Institucional da UFPE |
collection |
Repositório Institucional da UFPE |
bitstream.url.fl_str_mv |
https://repositorio.ufpe.br/bitstream/123456789/53559/1/DISSERTA%c3%87%c3%83O%20Di%c3%b3genes%20Wallis%20de%20Fran%c3%a7a%20Silva.pdf https://repositorio.ufpe.br/bitstream/123456789/53559/2/license_rdf https://repositorio.ufpe.br/bitstream/123456789/53559/3/license.txt https://repositorio.ufpe.br/bitstream/123456789/53559/4/DISSERTA%c3%87%c3%83O%20Di%c3%b3genes%20Wallis%20de%20Fran%c3%a7a%20Silva.pdf.txt https://repositorio.ufpe.br/bitstream/123456789/53559/5/DISSERTA%c3%87%c3%83O%20Di%c3%b3genes%20Wallis%20de%20Fran%c3%a7a%20Silva.pdf.jpg |
bitstream.checksum.fl_str_mv |
803075ebddbfdc79016478c06bdc6452 e39d27027a6cc9cb039ad269a5db8e34 5e89a1613ddc8510c6576f4b23a78973 a0a5ce8896356d2c70318e91e67a28a3 3ad063ab1a5f6299cc578af6bb1e86f2 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE) |
repository.mail.fl_str_mv |
attena@ufpe.br |
_version_ |
1802310667063001088 |