Using Item Response Theory to evaluate feature relevance in missing data scenarios

Detalhes bibliográficos
Autor(a) principal: REINALDO, Jessica Tais de Souza
Data de Publicação: 2022
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da UFPE
Texto Completo: https://repositorio.ufpe.br/handle/123456789/46381
Resumo: Item Response Theory (IRT) has been historically used to evaluate the latent abilities of human respondents to a set of items. Recently, e orts have been made to propose solutions that use IRT to solve classification problems, where the respondents are classifiers and the items are the instances of a dataset. Most of the initial works that tried to tackle this problem used a dichotomous IRT model, which is capable of modelling the classification problem only in terms of correct and wrong predictions. B3-IRT o ers a powerful tool to analyze datasets and classifiers, as the response is continuous, so instead of representing the predictions just as right or wrong answers, the response can be represented by the probability of a correct prediction. Although the IRT formulation can provide rich information about the behavior of the models towards the instances of a dataset, no previous work has investigated the application of IRT to rank features in an instance-based approach, or even to evaluate how missing data can impact the IRT parameters for instances (diculty and discrimination) and classifiers (ability). We propose a workflow that uses B3-IRT in missing data scenarios to evaluate the relevance of features both locally for each instance of a dataset, and globally for the whole dataset. In this workflow, data is missing at test time, and missing values are filled out with imputed values, in order to evaluate how much the missing data can a ect the ability of classifiers and di culty and discrimination of instances. This novel application represents an alternative to feature selection and feature ranking techniques that is capable to provide an overview of feature relevance both at global and instance level.
id UFPE_f0b9f9fdd64614b70ef3c032b257ea97
oai_identifier_str oai:repositorio.ufpe.br:123456789/46381
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str 2221
spelling REINALDO, Jessica Tais de Souzahttp://lattes.cnpq.br/0857916208146061http://lattes.cnpq.br/2984888073123287http://lattes.cnpq.br/4640945954423515PRUDÊNCIO, Ricardo Bastos CavalcanteSILVA FILHO, Telmo de Menezes e2022-09-13T18:54:57Z2022-09-13T18:54:57Z2022-03-29REINALDO, Jessica Tais de Souza. Using Item Response Theory to evaluate feature relevance in missing data scenarios. 2022. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2022.https://repositorio.ufpe.br/handle/123456789/46381Item Response Theory (IRT) has been historically used to evaluate the latent abilities of human respondents to a set of items. Recently, e orts have been made to propose solutions that use IRT to solve classification problems, where the respondents are classifiers and the items are the instances of a dataset. Most of the initial works that tried to tackle this problem used a dichotomous IRT model, which is capable of modelling the classification problem only in terms of correct and wrong predictions. B3-IRT o ers a powerful tool to analyze datasets and classifiers, as the response is continuous, so instead of representing the predictions just as right or wrong answers, the response can be represented by the probability of a correct prediction. Although the IRT formulation can provide rich information about the behavior of the models towards the instances of a dataset, no previous work has investigated the application of IRT to rank features in an instance-based approach, or even to evaluate how missing data can impact the IRT parameters for instances (diculty and discrimination) and classifiers (ability). We propose a workflow that uses B3-IRT in missing data scenarios to evaluate the relevance of features both locally for each instance of a dataset, and globally for the whole dataset. In this workflow, data is missing at test time, and missing values are filled out with imputed values, in order to evaluate how much the missing data can a ect the ability of classifiers and di culty and discrimination of instances. This novel application represents an alternative to feature selection and feature ranking techniques that is capable to provide an overview of feature relevance both at global and instance level.CNPqA Teoria de Resposta ao Item (TRI) tem sido historicamente usada para avaliar as habilidades latentes de respondentes humanos quando estes respondem a um conjunto de questões, chamadas de itens do problema de TRI. Recentemente, a comunidade ciêntifica começou a propor soluções que utilizem a TRI para resolver problemas de classificação, onde os respondentes são classificadores e os itens são as instâncias de um conjunto de dados. A maioria dos trabalhos iniciais que tentaram resolver este problema utilizou um modelo dicotômico de TRI, que é capaz de modelar o problema de classificação apenas em termos de previsões corretas e incorretas. O B3-IRT oferece uma formulação mais poderosa para esta aplicação de TRI, já que a resposta deste modelo é contínua, portanto, em vez de representar as predições de um modelo classificador apenas como respostas certas ou erradas (dicotômico), a resposta pode ser representada pela probabilidade de uma predição correta. Embora a formulação da TRI possa conter muita informação sobre o comportamento dos modelos em relação às instâncias de um conjunto de dados, nenhum trabalho anterior investigou a aplicação da TRI para classificar a relevância ou importância das variáveis de um conjunto de dados em uma abordagem baseada nas próprias instâncias, ou mesmo avaliar como dados faltantes podem afetar os parâmetros da TRI para instâncias (dificuldade e discriminação) e classificadores (habilidade). Neste trabalho, nós propomos um workflow que usa B3-IRT em cenários de dados faltantes para avaliar a relevância dos variáveis tanto localmente para cada instância quanto globalmente para todo o conjunto de dados. Nesse workflow, os dados faltantes ocorrem apenas no momento do teste, e os valores faltantes são preenchidos com valores imputados, a fim de avaliar o quanto os dados faltantes podem afetar a habilidade dos classificadores e a dificuldade e discriminação das instâncias. Esta nova abordagem proposta neste trabalho representa uma alternativa às técnicas de seleção e ranqueamento de variáveis capaz de fornecer uma visão geral da relevância das variáveis de um conjunto de dados tanto em nível global quanto em nível de instância.engUniversidade Federal de PernambucoPrograma de Pos Graduacao em Ciencia da ComputacaoUFPEBrasilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessInteligência artificialRanqueamento de variáveisUsing Item Response Theory to evaluate feature relevance in missing data scenariosinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesismestradoreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPETEXTDISSERTAÇÃO Jessica Tais de Souza Reinaldo.pdf.txtDISSERTAÇÃO Jessica Tais de Souza Reinaldo.pdf.txtExtracted texttext/plain236619https://repositorio.ufpe.br/bitstream/123456789/46381/4/DISSERTA%c3%87%c3%83O%20Jessica%20Tais%20de%20Souza%20Reinaldo.pdf.txt4610e3d96d512afe32a99aeeecd13cc0MD54THUMBNAILDISSERTAÇÃO Jessica Tais de Souza Reinaldo.pdf.jpgDISSERTAÇÃO Jessica Tais de Souza Reinaldo.pdf.jpgGenerated Thumbnailimage/jpeg1205https://repositorio.ufpe.br/bitstream/123456789/46381/5/DISSERTA%c3%87%c3%83O%20Jessica%20Tais%20de%20Souza%20Reinaldo.pdf.jpg351748bfff75fc4eb755e130140e8780MD55CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8811https://repositorio.ufpe.br/bitstream/123456789/46381/2/license_rdfe39d27027a6cc9cb039ad269a5db8e34MD52ORIGINALDISSERTAÇÃO Jessica Tais de Souza Reinaldo.pdfDISSERTAÇÃO Jessica Tais de Souza Reinaldo.pdfapplication/pdf14258811https://repositorio.ufpe.br/bitstream/123456789/46381/1/DISSERTA%c3%87%c3%83O%20Jessica%20Tais%20de%20Souza%20Reinaldo.pdf1ebe1877a3d53f7faaecdcdf67315bd1MD51LICENSElicense.txtlicense.txttext/plain; charset=utf-82142https://repositorio.ufpe.br/bitstream/123456789/46381/3/license.txt6928b9260b07fb2755249a5ca9903395MD53123456789/463812022-09-14 02:43:52.99oai:repositorio.ufpe.br:123456789/46381VGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyBwYXJhIFB1YmxpY2HDp8OjbyBkZSBEb2N1bWVudG9zIG5vIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUKIAoKRGVjbGFybyBlc3RhciBjaWVudGUgZGUgcXVlIGVzdGUgVGVybW8gZGUgRGVww7NzaXRvIExlZ2FsIGUgQXV0b3JpemHDp8OjbyB0ZW0gbyBvYmpldGl2byBkZSBkaXZ1bGdhw6fDo28gZG9zIGRvY3VtZW50b3MgZGVwb3NpdGFkb3Mgbm8gUmVwb3NpdMOzcmlvIERpZ2l0YWwgZGEgVUZQRSBlIGRlY2xhcm8gcXVlOgoKSSAtICBvIGNvbnRlw7pkbyBkaXNwb25pYmlsaXphZG8gw6kgZGUgcmVzcG9uc2FiaWxpZGFkZSBkZSBzdWEgYXV0b3JpYTsKCklJIC0gbyBjb250ZcO6ZG8gw6kgb3JpZ2luYWwsIGUgc2UgbyB0cmFiYWxobyBlL291IHBhbGF2cmFzIGRlIG91dHJhcyBwZXNzb2FzIGZvcmFtIHV0aWxpemFkb3MsIGVzdGFzIGZvcmFtIGRldmlkYW1lbnRlIHJlY29uaGVjaWRhczsKCklJSSAtIHF1YW5kbyB0cmF0YXItc2UgZGUgVHJhYmFsaG8gZGUgQ29uY2x1c8OjbyBkZSBDdXJzbywgRGlzc2VydGHDp8OjbyBvdSBUZXNlOiBvIGFycXVpdm8gZGVwb3NpdGFkbyBjb3JyZXNwb25kZSDDoCB2ZXJzw6NvIGZpbmFsIGRvIHRyYWJhbGhvOwoKSVYgLSBxdWFuZG8gdHJhdGFyLXNlIGRlIFRyYWJhbGhvIGRlIENvbmNsdXPDo28gZGUgQ3Vyc28sIERpc3NlcnRhw6fDo28gb3UgVGVzZTogZXN0b3UgY2llbnRlIGRlIHF1ZSBhIGFsdGVyYcOnw6NvIGRhIG1vZGFsaWRhZGUgZGUgYWNlc3NvIGFvIGRvY3VtZW50byBhcMOzcyBvIGRlcMOzc2l0byBlIGFudGVzIGRlIGZpbmRhciBvIHBlcsOtb2RvIGRlIGVtYmFyZ28sIHF1YW5kbyBmb3IgZXNjb2xoaWRvIGFjZXNzbyByZXN0cml0bywgc2Vyw6EgcGVybWl0aWRhIG1lZGlhbnRlIHNvbGljaXRhw6fDo28gZG8gKGEpIGF1dG9yIChhKSBhbyBTaXN0ZW1hIEludGVncmFkbyBkZSBCaWJsaW90ZWNhcyBkYSBVRlBFIChTSUIvVUZQRSkuCgogClBhcmEgdHJhYmFsaG9zIGVtIEFjZXNzbyBBYmVydG86CgpOYSBxdWFsaWRhZGUgZGUgdGl0dWxhciBkb3MgZGlyZWl0b3MgYXV0b3JhaXMgZGUgYXV0b3IgcXVlIHJlY2FlbSBzb2JyZSBlc3RlIGRvY3VtZW50bywgZnVuZGFtZW50YWRvIG5hIExlaSBkZSBEaXJlaXRvIEF1dG9yYWwgbm8gOS42MTAsIGRlIDE5IGRlIGZldmVyZWlybyBkZSAxOTk4LCBhcnQuIDI5LCBpbmNpc28gSUlJLCBhdXRvcml6byBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFBlcm5hbWJ1Y28gYSBkaXNwb25pYmlsaXphciBncmF0dWl0YW1lbnRlLCBzZW0gcmVzc2FyY2ltZW50byBkb3MgZGlyZWl0b3MgYXV0b3JhaXMsIHBhcmEgZmlucyBkZSBsZWl0dXJhLCBpbXByZXNzw6NvIGUvb3UgZG93bmxvYWQgKGFxdWlzacOnw6NvKSBhdHJhdsOpcyBkbyBzaXRlIGRvIFJlcG9zaXTDs3JpbyBEaWdpdGFsIGRhIFVGUEUgbm8gZW5kZXJlw6dvIGh0dHA6Ly93d3cucmVwb3NpdG9yaW8udWZwZS5iciwgYSBwYXJ0aXIgZGEgZGF0YSBkZSBkZXDDs3NpdG8uCgogClBhcmEgdHJhYmFsaG9zIGVtIEFjZXNzbyBSZXN0cml0bzoKCk5hIHF1YWxpZGFkZSBkZSB0aXR1bGFyIGRvcyBkaXJlaXRvcyBhdXRvcmFpcyBkZSBhdXRvciBxdWUgcmVjYWVtIHNvYnJlIGVzdGUgZG9jdW1lbnRvLCBmdW5kYW1lbnRhZG8gbmEgTGVpIGRlIERpcmVpdG8gQXV0b3JhbCBubyA5LjYxMCBkZSAxOSBkZSBmZXZlcmVpcm8gZGUgMTk5OCwgYXJ0LiAyOSwgaW5jaXNvIElJSSwgYXV0b3Jpem8gYSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBQZXJuYW1idWNvIGEgZGlzcG9uaWJpbGl6YXIgZ3JhdHVpdGFtZW50ZSwgc2VtIHJlc3NhcmNpbWVudG8gZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCBwYXJhIGZpbnMgZGUgbGVpdHVyYSwgaW1wcmVzc8OjbyBlL291IGRvd25sb2FkIChhcXVpc2nDp8OjbykgYXRyYXbDqXMgZG8gc2l0ZSBkbyBSZXBvc2l0w7NyaW8gRGlnaXRhbCBkYSBVRlBFIG5vIGVuZGVyZcOnbyBodHRwOi8vd3d3LnJlcG9zaXRvcmlvLnVmcGUuYnIsIHF1YW5kbyBmaW5kYXIgbyBwZXLDrW9kbyBkZSBlbWJhcmdvIGNvbmRpemVudGUgYW8gdGlwbyBkZSBkb2N1bWVudG8sIGNvbmZvcm1lIGluZGljYWRvIG5vIGNhbXBvIERhdGEgZGUgRW1iYXJnby4KRepositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212022-09-14T05:43:52Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.pt_BR.fl_str_mv Using Item Response Theory to evaluate feature relevance in missing data scenarios
title Using Item Response Theory to evaluate feature relevance in missing data scenarios
spellingShingle Using Item Response Theory to evaluate feature relevance in missing data scenarios
REINALDO, Jessica Tais de Souza
Inteligência artificial
Ranqueamento de variáveis
title_short Using Item Response Theory to evaluate feature relevance in missing data scenarios
title_full Using Item Response Theory to evaluate feature relevance in missing data scenarios
title_fullStr Using Item Response Theory to evaluate feature relevance in missing data scenarios
title_full_unstemmed Using Item Response Theory to evaluate feature relevance in missing data scenarios
title_sort Using Item Response Theory to evaluate feature relevance in missing data scenarios
author REINALDO, Jessica Tais de Souza
author_facet REINALDO, Jessica Tais de Souza
author_role author
dc.contributor.authorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/0857916208146061
dc.contributor.advisorLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/2984888073123287
dc.contributor.advisor-coLattes.pt_BR.fl_str_mv http://lattes.cnpq.br/4640945954423515
dc.contributor.author.fl_str_mv REINALDO, Jessica Tais de Souza
dc.contributor.advisor1.fl_str_mv PRUDÊNCIO, Ricardo Bastos Cavalcante
dc.contributor.advisor-co1.fl_str_mv SILVA FILHO, Telmo de Menezes e
contributor_str_mv PRUDÊNCIO, Ricardo Bastos Cavalcante
SILVA FILHO, Telmo de Menezes e
dc.subject.por.fl_str_mv Inteligência artificial
Ranqueamento de variáveis
topic Inteligência artificial
Ranqueamento de variáveis
description Item Response Theory (IRT) has been historically used to evaluate the latent abilities of human respondents to a set of items. Recently, e orts have been made to propose solutions that use IRT to solve classification problems, where the respondents are classifiers and the items are the instances of a dataset. Most of the initial works that tried to tackle this problem used a dichotomous IRT model, which is capable of modelling the classification problem only in terms of correct and wrong predictions. B3-IRT o ers a powerful tool to analyze datasets and classifiers, as the response is continuous, so instead of representing the predictions just as right or wrong answers, the response can be represented by the probability of a correct prediction. Although the IRT formulation can provide rich information about the behavior of the models towards the instances of a dataset, no previous work has investigated the application of IRT to rank features in an instance-based approach, or even to evaluate how missing data can impact the IRT parameters for instances (diculty and discrimination) and classifiers (ability). We propose a workflow that uses B3-IRT in missing data scenarios to evaluate the relevance of features both locally for each instance of a dataset, and globally for the whole dataset. In this workflow, data is missing at test time, and missing values are filled out with imputed values, in order to evaluate how much the missing data can a ect the ability of classifiers and di culty and discrimination of instances. This novel application represents an alternative to feature selection and feature ranking techniques that is capable to provide an overview of feature relevance both at global and instance level.
publishDate 2022
dc.date.accessioned.fl_str_mv 2022-09-13T18:54:57Z
dc.date.available.fl_str_mv 2022-09-13T18:54:57Z
dc.date.issued.fl_str_mv 2022-03-29
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv REINALDO, Jessica Tais de Souza. Using Item Response Theory to evaluate feature relevance in missing data scenarios. 2022. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2022.
dc.identifier.uri.fl_str_mv https://repositorio.ufpe.br/handle/123456789/46381
identifier_str_mv REINALDO, Jessica Tais de Souza. Using Item Response Theory to evaluate feature relevance in missing data scenarios. 2022. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2022.
url https://repositorio.ufpe.br/handle/123456789/46381
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.publisher.program.fl_str_mv Programa de Pos Graduacao em Ciencia da Computacao
dc.publisher.initials.fl_str_mv UFPE
dc.publisher.country.fl_str_mv Brasil
publisher.none.fl_str_mv Universidade Federal de Pernambuco
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
bitstream.url.fl_str_mv https://repositorio.ufpe.br/bitstream/123456789/46381/4/DISSERTA%c3%87%c3%83O%20Jessica%20Tais%20de%20Souza%20Reinaldo.pdf.txt
https://repositorio.ufpe.br/bitstream/123456789/46381/5/DISSERTA%c3%87%c3%83O%20Jessica%20Tais%20de%20Souza%20Reinaldo.pdf.jpg
https://repositorio.ufpe.br/bitstream/123456789/46381/2/license_rdf
https://repositorio.ufpe.br/bitstream/123456789/46381/1/DISSERTA%c3%87%c3%83O%20Jessica%20Tais%20de%20Souza%20Reinaldo.pdf
https://repositorio.ufpe.br/bitstream/123456789/46381/3/license.txt
bitstream.checksum.fl_str_mv 4610e3d96d512afe32a99aeeecd13cc0
351748bfff75fc4eb755e130140e8780
e39d27027a6cc9cb039ad269a5db8e34
1ebe1877a3d53f7faaecdcdf67315bd1
6928b9260b07fb2755249a5ca9903395
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1802310876386033664