Covariate shift adaptation and dataset shift decomposition in machine learning

Detalhes bibliográficos
Autor(a) principal: Polo, Felipe Maia
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Biblioteca Digital de Teses e Dissertações da USP
Texto Completo: https://www.teses.usp.br/teses/disponiveis/45/45133/tde-03022022-234955/
Resumo: In supervised learning, we often have access to a limited sample, in size or quality (e.g., lack of labels), of the population/distribution of interest, for which we want to create predictive models. However, it is possible that we have less limited access to data sampled from another population, more or less similar to the one of interest. Training models using only data from the population of interest may be impossible or result in sub-optimal models, so it would be interesting to use data from the other population in order to get better results or make training possible. In these situations, as the distributions of interest and the one that we can sample with few restrictions are different, we say that there is dataset shift. In dataset shift situations, employing domain adaptation techniques when training supervised models is essential for theoretical guarantees of good results in the population of interest. The two kinds of dataset shift we will discuss about in this work are covariate shift and concept drift/shift. The main objectives of this work are: (i) to review the main concepts and methods related to covariate shift and covariate shift adaptation; (ii) propose contributions to the covariate shift adaptation literature, connecting concepts present in modern literature; (iii) propose the decomposition of the dataset shift into covariate shift and expected concept drift/shift as a new approach to better understand situations in which we deal with dataset shift.
id USP_3095bb4476439a8086c923a5936bfa42
oai_identifier_str oai:teses.usp.br:tde-03022022-234955
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling Covariate shift adaptation and dataset shift decomposition in machine learningAdaptação para covariate shift e decomposição do dataset shift no aprendizado de máquinaAdaptação de dominioConcept driftConcept driftCovariate shiftCovariate shiftDataset shiftDataset shiftDataset shift decompositionDecomposição do dataset shiftDimensionalidadeDimensionalityDomain adaptationEffective sample sizeEffective sample sizeEstatisticaMachine learningMachine searningStatisticsIn supervised learning, we often have access to a limited sample, in size or quality (e.g., lack of labels), of the population/distribution of interest, for which we want to create predictive models. However, it is possible that we have less limited access to data sampled from another population, more or less similar to the one of interest. Training models using only data from the population of interest may be impossible or result in sub-optimal models, so it would be interesting to use data from the other population in order to get better results or make training possible. In these situations, as the distributions of interest and the one that we can sample with few restrictions are different, we say that there is dataset shift. In dataset shift situations, employing domain adaptation techniques when training supervised models is essential for theoretical guarantees of good results in the population of interest. The two kinds of dataset shift we will discuss about in this work are covariate shift and concept drift/shift. The main objectives of this work are: (i) to review the main concepts and methods related to covariate shift and covariate shift adaptation; (ii) propose contributions to the covariate shift adaptation literature, connecting concepts present in modern literature; (iii) propose the decomposition of the dataset shift into covariate shift and expected concept drift/shift as a new approach to better understand situations in which we deal with dataset shift.No aprendizado supervisionado, muitas vezes temos acesso a uma amostra limitada, em tamanho ou qualidade (e.g., falta de rotulos), de dados da populacao/distribuicao de interesse, para a qual queremos criar modelos preditivos. No entanto, e possivel que tenhamos acesso pouco limitado a dados amostrados de outra populacao, mais ou menos parecida com a de interesse. Treinar modelos utilizando somente dados da populacao de interesse pode ser impossivel ou resultar em modelos sub-otimos, entao seria interessante utilizar os dados provenientes da outra populacao a fim de obter melhores resultados ou tornar o treinamento possivel. Nessas situacoes, como as distribuicoes de interesse e aquela que podemos amostrar com poucas restricoes sao diferentes, dizemos que ha dataset shift. Em situacoes de dataset shift, empregar tecnicas de adaptacao de dominio ao treinar modelos supervisionados e essencial para garantias teoricas de bons resultados na populacao de interesse. Os dois tipos de dataset shift que discutiremos neste trabalho sao covariate shift e concept drift/shift. Os objetivos principais deste trabalho sao: (i) revisar principais conceitos e metodos relacionados ao covariate shift e covariate shift adaptation; (ii) propor contribuicoes para a literatura de covariate shift adaptation, conectando conceitos presentes em discussoes atuais; (iii) propor a decomposicao do dataset Shift em covariate shift e concept drift/shift esperado como uma nova abordagem para melhor entendimento de situacoes em que lidamos com dataset shift.Biblioteca Digitais de Teses e Dissertações da USPVicente, RenatoPolo, Felipe Maia2021-10-15info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45133/tde-03022022-234955/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-08-14T22:12:03Zoai:teses.usp.br:tde-03022022-234955Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-08-14T22:12:03Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv Covariate shift adaptation and dataset shift decomposition in machine learning
Adaptação para covariate shift e decomposição do dataset shift no aprendizado de máquina
title Covariate shift adaptation and dataset shift decomposition in machine learning
spellingShingle Covariate shift adaptation and dataset shift decomposition in machine learning
Polo, Felipe Maia
Adaptação de dominio
Concept drift
Concept drift
Covariate shift
Covariate shift
Dataset shift
Dataset shift
Dataset shift decomposition
Decomposição do dataset shift
Dimensionalidade
Dimensionality
Domain adaptation
Effective sample size
Effective sample size
Estatistica
Machine learning
Machine searning
Statistics
title_short Covariate shift adaptation and dataset shift decomposition in machine learning
title_full Covariate shift adaptation and dataset shift decomposition in machine learning
title_fullStr Covariate shift adaptation and dataset shift decomposition in machine learning
title_full_unstemmed Covariate shift adaptation and dataset shift decomposition in machine learning
title_sort Covariate shift adaptation and dataset shift decomposition in machine learning
author Polo, Felipe Maia
author_facet Polo, Felipe Maia
author_role author
dc.contributor.none.fl_str_mv Vicente, Renato
dc.contributor.author.fl_str_mv Polo, Felipe Maia
dc.subject.por.fl_str_mv Adaptação de dominio
Concept drift
Concept drift
Covariate shift
Covariate shift
Dataset shift
Dataset shift
Dataset shift decomposition
Decomposição do dataset shift
Dimensionalidade
Dimensionality
Domain adaptation
Effective sample size
Effective sample size
Estatistica
Machine learning
Machine searning
Statistics
topic Adaptação de dominio
Concept drift
Concept drift
Covariate shift
Covariate shift
Dataset shift
Dataset shift
Dataset shift decomposition
Decomposição do dataset shift
Dimensionalidade
Dimensionality
Domain adaptation
Effective sample size
Effective sample size
Estatistica
Machine learning
Machine searning
Statistics
description In supervised learning, we often have access to a limited sample, in size or quality (e.g., lack of labels), of the population/distribution of interest, for which we want to create predictive models. However, it is possible that we have less limited access to data sampled from another population, more or less similar to the one of interest. Training models using only data from the population of interest may be impossible or result in sub-optimal models, so it would be interesting to use data from the other population in order to get better results or make training possible. In these situations, as the distributions of interest and the one that we can sample with few restrictions are different, we say that there is dataset shift. In dataset shift situations, employing domain adaptation techniques when training supervised models is essential for theoretical guarantees of good results in the population of interest. The two kinds of dataset shift we will discuss about in this work are covariate shift and concept drift/shift. The main objectives of this work are: (i) to review the main concepts and methods related to covariate shift and covariate shift adaptation; (ii) propose contributions to the covariate shift adaptation literature, connecting concepts present in modern literature; (iii) propose the decomposition of the dataset shift into covariate shift and expected concept drift/shift as a new approach to better understand situations in which we deal with dataset shift.
publishDate 2021
dc.date.none.fl_str_mv 2021-10-15
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/45/45133/tde-03022022-234955/
url https://www.teses.usp.br/teses/disponiveis/45/45133/tde-03022022-234955/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1809090328426184704