Representação multimodal para classificação de informação

Ito, Fernando Tadao

Representação multimodal para classificação de informação

Detalhes bibliográficos
Autor(a) principal:	Ito, Fernando Tadao
Data de Publicação:	2018
Tipo de documento:	Dissertação
Idioma:	por
Título da fonte:	Repositório Institucional da UFSCAR
Texto Completo:	https://repositorio.ufscar.br/handle/ufscar/10365
Resumo:	The most basic meaning of "multimodality" is the use of multiple means of information to compose an "artifact", a man-made object that expresses a concept. In our day-to-day life, most media outlets use multimedia to express information: news are composed of videos, narrations and ancillary texts; theater plays tell a story from actors, gestures and songs; electronic games use the player's physical gestures as actions, and respond with visual or musical cues. To interpret such "artifacts," we have to extract information from multiple media and combine them mathematically. The extraction of characteristics is done from mathematical models that receive raw data (texts, images, audio signals) and turns it into a numerical vector, where the distance between instances denotes its relation, where close data encode similar meanings. To create a multimodal semantic space, we use models that `` fuse '' information from multiple data types. In this work, we investigate the interaction between different modes of information representation in the formation of multimodal representations, presenting some of the most used algorithms for vector representation of texts and images and how to merge them. To measure the relative performance of each combination of methods, we use classification and similarity tasks in databases with images and paired texts. We found that in our data sets different methods of unimodal representation can lead to vastly different results. We also note that the performance of a representation in the data classification task does not mean that such representation does not encode the concept of an object, having different results in similarity tasks.

Metadados do item

id	SCAR_e13df0980d6d564d6509f11be5074791
oai_identifier_str	oai:repositorio.ufscar.br:ufscar/10365
network_acronym_str	SCAR
network_name_str	Repositório Institucional da UFSCAR
repository_id_str	4322
spelling	Ito, Fernando TadaoCaseli, Helena de Medeiroshttp://lattes.cnpq.br/6608582057810385Moreira, Janderhttp://lattes.cnpq.br/7638816418156415http://lattes.cnpq.br/18162022628685382cee8072-6c6f-46f1-b5d5-b06a8262d8fe2018-08-14T20:03:27Z2018-08-14T20:03:27Z2018-06-08ITO, Fernando Tadao. Representação multimodal para classificação de informação. 2018. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2018. Disponível em: https://repositorio.ufscar.br/handle/ufscar/10365.https://repositorio.ufscar.br/handle/ufscar/10365The most basic meaning of "multimodality" is the use of multiple means of information to compose an "artifact", a man-made object that expresses a concept. In our day-to-day life, most media outlets use multimedia to express information: news are composed of videos, narrations and ancillary texts; theater plays tell a story from actors, gestures and songs; electronic games use the player's physical gestures as actions, and respond with visual or musical cues. To interpret such "artifacts," we have to extract information from multiple media and combine them mathematically. The extraction of characteristics is done from mathematical models that receive raw data (texts, images, audio signals) and turns it into a numerical vector, where the distance between instances denotes its relation, where close data encode similar meanings. To create a multimodal semantic space, we use models that `` fuse '' information from multiple data types. In this work, we investigate the interaction between different modes of information representation in the formation of multimodal representations, presenting some of the most used algorithms for vector representation of texts and images and how to merge them. To measure the relative performance of each combination of methods, we use classification and similarity tasks in databases with images and paired texts. We found that in our data sets different methods of unimodal representation can lead to vastly different results. We also note that the performance of a representation in the data classification task does not mean that such representation does not encode the concept of an object, having different results in similarity tasks.O significado mais básico de ``multimodalidade'' é a utilização de múltiplos meios de informação para compor um ``artefato'', um objeto criado pelo homem que expressa um conceito. Em nosso dia-a-dia, diversos meios de comunicação expressam conceitos a partir de multimídia: notícias com narração, vídeos e textos auxiliares; peças de teatro que contam uma história a partir de atores, gestos e músicas; jogos eletrônicos que utilizam os gestos físicos do jogador como ações, e respondem com sinais visuais ou musicais. Para interpretar tais ``artefatos'', temos que extrair informações de múltiplos meios de informação e combiná-los matematicamente. A extração de características é feita a partir de modelos matemáticos que recebem um dado bruto (textos, imagens, sinais de áudio) e o transforma em um vetor numérico, onde a distância entre instâncias denota a sua relação: dados próximos codificam significados similares. Para criar um espaço semântico multimodal, utilizamos modelos que ``fundem'' as informações de múltiplos tipos de dados. Neste trabalho, investigamos a interação entre diferentes modos de representação de informação na formação de representações multimodais, apresentando alguns dos algoritmos mais usados para a representação vetorial de textos e imagens e como fundi-los. Para medir a performance relativa de cada combinação de métodos, utilizamos tarefas de classificação e similaridade em bancos de dados com imagens e textos pareados. Verificamos que, em nossos conjuntos de dados, diferentes métodos de representação unimodal podem levar a resultados vastamente diferentes. Também notamos que a performance de uma representação na tarefa de classificação de dados não significa que tal representação não codifique o conceito de um objeto, tendo diferentes resultados em tarefas de similaridade.Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)CNPq: 136958/2016-8FAPESP: 16/13002-0porUniversidade Federal de São CarlosCâmpus São CarlosPrograma de Pós-Graduação em Ciência da Computação - PPGCCUFSCarRepresentação multimodalRepresentação distribuídaInteligência artificialAprendizado não-supervisionadoMultimodal representationDistributed representationAutoencoderArtificial intelligenceUnsupervised learningCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAORepresentação multimodal para classificação de informaçãoinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisOnline600600e36d4e63-960d-4f5c-9c93-f8b7f5f93d65info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFSCARinstname:Universidade Federal de São Carlos (UFSCAR)instacron:UFSCARORIGINALDisserta__o___Fernando_Tadao_Ito (1).pdfDisserta__o___Fernando_Tadao_Ito (1).pdfDissertationapplication/pdf4509807https://repositorio.ufscar.br/bitstream/ufscar/10365/6/Disserta__o___Fernando_Tadao_Ito%20%281%29.pdf523a76df2f013f3d35368ac1ea64b6fdMD56LICENSElicense.txtlicense.txttext/plain; charset=utf-81957https://repositorio.ufscar.br/bitstream/ufscar/10365/7/license.txtae0398b6f8b235e40ad82cba6c50031dMD57TEXTDisserta__o___Fernando_Tadao_Ito (1).pdf.txtDisserta__o___Fernando_Tadao_Ito (1).pdf.txtExtracted texttext/plain188519https://repositorio.ufscar.br/bitstream/ufscar/10365/8/Disserta__o___Fernando_Tadao_Ito%20%281%29.pdf.txt7768258d1197b9bcfe6bfe9ecfb637d8MD58THUMBNAILDisserta__o___Fernando_Tadao_Ito (1).pdf.jpgDisserta__o___Fernando_Tadao_Ito (1).pdf.jpgIM Thumbnailimage/jpeg8349https://repositorio.ufscar.br/bitstream/ufscar/10365/9/Disserta__o___Fernando_Tadao_Ito%20%281%29.pdf.jpg972455a9a73fb7cc9b13ba84838a8636MD59ufscar/103652023-09-18 18:31:16.381oai:repositorio.ufscar.br:ufscar/10365TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvciAoZXMpIG91IG8gdGl0dWxhciBkb3MgZGlyZWl0b3MgZGUgYXV0b3IpIGNvbmNlZGUgw6AgVW5pdmVyc2lkYWRlCkZlZGVyYWwgZGUgU8OjbyBDYXJsb3MgbyBkaXJlaXRvIG7Do28tZXhjbHVzaXZvIGRlIHJlcHJvZHV6aXIsICB0cmFkdXppciAoY29uZm9ybWUgZGVmaW5pZG8gYWJhaXhvKSwgZS9vdQpkaXN0cmlidWlyIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyAoaW5jbHVpbmRvIG8gcmVzdW1vKSBwb3IgdG9kbyBvIG11bmRvIG5vIGZvcm1hdG8gaW1wcmVzc28gZSBlbGV0csO0bmljbyBlCmVtIHF1YWxxdWVyIG1laW8sIGluY2x1aW5kbyBvcyBmb3JtYXRvcyDDoXVkaW8gb3UgdsOtZGVvLgoKVm9jw6ogY29uY29yZGEgcXVlIGEgVUZTQ2FyIHBvZGUsIHNlbSBhbHRlcmFyIG8gY29udGXDumRvLCB0cmFuc3BvciBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28KcGFyYSBxdWFscXVlciBtZWlvIG91IGZvcm1hdG8gcGFyYSBmaW5zIGRlIHByZXNlcnZhw6fDo28uCgpWb2PDqiB0YW1iw6ltIGNvbmNvcmRhIHF1ZSBhIFVGU0NhciBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgYSBzdWEgdGVzZSBvdQpkaXNzZXJ0YcOnw6NvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIGEgc3VhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcwpuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0byBkYSBzdWEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvIG7Do28sIHF1ZSBzZWphIGRlIHNldQpjb25oZWNpbWVudG8sIGluZnJpbmdlIGRpcmVpdG9zIGF1dG9yYWlzIGRlIG5pbmd1w6ltLgoKQ2FzbyBhIHN1YSB0ZXNlIG91IGRpc3NlcnRhw6fDo28gY29udGVuaGEgbWF0ZXJpYWwgcXVlIHZvY8OqIG7Do28gcG9zc3VpIGEgdGl0dWxhcmlkYWRlIGRvcyBkaXJlaXRvcyBhdXRvcmFpcywgdm9jw6oKZGVjbGFyYSBxdWUgb2J0ZXZlIGEgcGVybWlzc8OjbyBpcnJlc3RyaXRhIGRvIGRldGVudG9yIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBwYXJhIGNvbmNlZGVyIMOgIFVGU0NhcgpvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUKaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvIGRhIHRlc2Ugb3UgZGlzc2VydGHDp8OjbyBvcmEgZGVwb3NpdGFkYS4KCkNBU08gQSBURVNFIE9VIERJU1NFUlRBw4fDg08gT1JBIERFUE9TSVRBREEgVEVOSEEgU0lETyBSRVNVTFRBRE8gREUgVU0gUEFUUk9Dw41OSU8gT1UKQVBPSU8gREUgVU1BIEFHw4pOQ0lBIERFIEZPTUVOVE8gT1UgT1VUUk8gT1JHQU5JU01PIFFVRSBOw4NPIFNFSkEgQSBVRlNDYXIsClZPQ8OKIERFQ0xBUkEgUVVFIFJFU1BFSVRPVSBUT0RPUyBFIFFVQUlTUVVFUiBESVJFSVRPUyBERSBSRVZJU8ODTyBDT01PClRBTULDiU0gQVMgREVNQUlTIE9CUklHQcOHw5VFUyBFWElHSURBUyBQT1IgQ09OVFJBVE8gT1UgQUNPUkRPLgoKQSBVRlNDYXIgc2UgY29tcHJvbWV0ZSBhIGlkZW50aWZpY2FyIGNsYXJhbWVudGUgbyBzZXUgbm9tZSAocykgb3UgbyhzKSBub21lKHMpIGRvKHMpCmRldGVudG9yKGVzKSBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGEgdGVzZSBvdSBkaXNzZXJ0YcOnw6NvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzCmNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuCg==Repositório InstitucionalPUBhttps://repositorio.ufscar.br/oai/requestopendoar:43222023-09-18T18:31:16Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)false
dc.title.por.fl_str_mv	Representação multimodal para classificação de informação
title	Representação multimodal para classificação de informação
spellingShingle	Representação multimodal para classificação de informação Ito, Fernando Tadao Representação multimodal Representação distribuída Inteligência artificial Aprendizado não-supervisionado Multimodal representation Distributed representation Autoencoder Artificial intelligence Unsupervised learning CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
title_short	Representação multimodal para classificação de informação
title_full	Representação multimodal para classificação de informação
title_fullStr	Representação multimodal para classificação de informação
title_full_unstemmed	Representação multimodal para classificação de informação
title_sort	Representação multimodal para classificação de informação
author	Ito, Fernando Tadao
author_facet	Ito, Fernando Tadao
author_role	author
dc.contributor.authorlattes.por.fl_str_mv	http://lattes.cnpq.br/1816202262868538
dc.contributor.author.fl_str_mv	Ito, Fernando Tadao
dc.contributor.advisor1.fl_str_mv	Caseli, Helena de Medeiros
dc.contributor.advisor1Lattes.fl_str_mv	http://lattes.cnpq.br/6608582057810385
dc.contributor.advisor-co1.fl_str_mv	Moreira, Jander
dc.contributor.advisor-co1Lattes.fl_str_mv	http://lattes.cnpq.br/7638816418156415
dc.contributor.authorID.fl_str_mv	2cee8072-6c6f-46f1-b5d5-b06a8262d8fe
contributor_str_mv	Caseli, Helena de Medeiros Moreira, Jander
dc.subject.por.fl_str_mv	Representação multimodal Representação distribuída Inteligência artificial Aprendizado não-supervisionado
topic	Representação multimodal Representação distribuída Inteligência artificial Aprendizado não-supervisionado Multimodal representation Distributed representation Autoencoder Artificial intelligence Unsupervised learning CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
dc.subject.eng.fl_str_mv	Multimodal representation Distributed representation Autoencoder Artificial intelligence Unsupervised learning
dc.subject.cnpq.fl_str_mv	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO::METODOLOGIA E TECNICAS DA COMPUTACAO
description	The most basic meaning of "multimodality" is the use of multiple means of information to compose an "artifact", a man-made object that expresses a concept. In our day-to-day life, most media outlets use multimedia to express information: news are composed of videos, narrations and ancillary texts; theater plays tell a story from actors, gestures and songs; electronic games use the player's physical gestures as actions, and respond with visual or musical cues. To interpret such "artifacts," we have to extract information from multiple media and combine them mathematically. The extraction of characteristics is done from mathematical models that receive raw data (texts, images, audio signals) and turns it into a numerical vector, where the distance between instances denotes its relation, where close data encode similar meanings. To create a multimodal semantic space, we use models that `` fuse '' information from multiple data types. In this work, we investigate the interaction between different modes of information representation in the formation of multimodal representations, presenting some of the most used algorithms for vector representation of texts and images and how to merge them. To measure the relative performance of each combination of methods, we use classification and similarity tasks in databases with images and paired texts. We found that in our data sets different methods of unimodal representation can lead to vastly different results. We also note that the performance of a representation in the data classification task does not mean that such representation does not encode the concept of an object, having different results in similarity tasks.
publishDate	2018
dc.date.accessioned.fl_str_mv	2018-08-14T20:03:27Z
dc.date.available.fl_str_mv	2018-08-14T20:03:27Z
dc.date.issued.fl_str_mv	2018-06-08
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	ITO, Fernando Tadao. Representação multimodal para classificação de informação. 2018. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2018. Disponível em: https://repositorio.ufscar.br/handle/ufscar/10365.
dc.identifier.uri.fl_str_mv	https://repositorio.ufscar.br/handle/ufscar/10365
identifier_str_mv	ITO, Fernando Tadao. Representação multimodal para classificação de informação. 2018. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2018. Disponível em: https://repositorio.ufscar.br/handle/ufscar/10365.
url	https://repositorio.ufscar.br/handle/ufscar/10365
dc.language.iso.fl_str_mv	por
language	por
dc.relation.confidence.fl_str_mv	600 600
dc.relation.authority.fl_str_mv	e36d4e63-960d-4f5c-9c93-f8b7f5f93d65
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.publisher.initials.fl_str_mv	UFSCar
publisher.none.fl_str_mv	Universidade Federal de São Carlos Câmpus São Carlos
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFSCAR instname:Universidade Federal de São Carlos (UFSCAR) instacron:UFSCAR
instname_str	Universidade Federal de São Carlos (UFSCAR)
instacron_str	UFSCAR
institution	UFSCAR
reponame_str	Repositório Institucional da UFSCAR
collection	Repositório Institucional da UFSCAR
bitstream.url.fl_str_mv	https://repositorio.ufscar.br/bitstream/ufscar/10365/6/Disserta__o___Fernando_Tadao_Ito%20%281%29.pdf https://repositorio.ufscar.br/bitstream/ufscar/10365/7/license.txt https://repositorio.ufscar.br/bitstream/ufscar/10365/8/Disserta__o___Fernando_Tadao_Ito%20%281%29.pdf.txt https://repositorio.ufscar.br/bitstream/ufscar/10365/9/Disserta__o___Fernando_Tadao_Ito%20%281%29.pdf.jpg
bitstream.checksum.fl_str_mv	523a76df2f013f3d35368ac1ea64b6fd ae0398b6f8b235e40ad82cba6c50031d 7768258d1197b9bcfe6bfe9ecfb637d8 972455a9a73fb7cc9b13ba84838a8636
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositório Institucional da UFSCAR - Universidade Federal de São Carlos (UFSCAR)
repository.mail.fl_str_mv
_version_	1802136344653201408

Representação multimodal para classificação de informação

Registros relacionados