Resampling strategies for regression

Luís Torgo; Paula Oliveira Branco; Rita Paula Ribeiro; Pfahringer,B

Resampling strategies for regression

Detalhes bibliográficos
Autor(a) principal:	Luís Torgo
Data de Publicação:	2015
Outros Autores:	Paula Oliveira Branco, Rita Paula Ribeiro, Pfahringer,B
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	http://repositorio.inesctec.pt/handle/123456789/4625 http://dx.doi.org/10.1111/exsy.12081
Resumo:	Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal, we have a problem of class imbalance that was thoroughly studied within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important applications involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by resampling approaches that change the distribution of the given data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present two modifications of well-known resampling strategies for classification tasks: the under-sampling and the synthetic minority over-sampling technique (SMOTE) methods. These modifications allow the use of these strategies on regression tasks where the goal is to forecast rare extreme values of the target variable. In an extensive set of experiments, we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed resampling methods can be used with any existing regression algorithm, which means that they are general tools for addressing problems of forecasting rare extreme values of a continuous target variable.

Metadados do item

id	RCAP_082b0e88187c2894948b1b11f4dbea47
oai_identifier_str	oai:repositorio.inesctec.pt:123456789/4625
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Resampling strategies for regressionSeveral real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal, we have a problem of class imbalance that was thoroughly studied within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important applications involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by resampling approaches that change the distribution of the given data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present two modifications of well-known resampling strategies for classification tasks: the under-sampling and the synthetic minority over-sampling technique (SMOTE) methods. These modifications allow the use of these strategies on regression tasks where the goal is to forecast rare extreme values of the target variable. In an extensive set of experiments, we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed resampling methods can be used with any existing regression algorithm, which means that they are general tools for addressing problems of forecasting rare extreme values of a continuous target variable.2017-12-21T12:24:27Z2015-01-01T00:00:00Z2015info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/4625http://dx.doi.org/10.1111/exsy.12081engLuís TorgoPaula Oliveira BrancoRita Paula RibeiroPfahringer,Binfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-05-15T10:20:25Zoai:repositorio.inesctec.pt:123456789/4625Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:53:05.602945Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Resampling strategies for regression
title	Resampling strategies for regression
spellingShingle	Resampling strategies for regression Luís Torgo
title_short	Resampling strategies for regression
title_full	Resampling strategies for regression
title_fullStr	Resampling strategies for regression
title_full_unstemmed	Resampling strategies for regression
title_sort	Resampling strategies for regression
author	Luís Torgo
author_facet	Luís Torgo Paula Oliveira Branco Rita Paula Ribeiro Pfahringer,B
author_role	author
author2	Paula Oliveira Branco Rita Paula Ribeiro Pfahringer,B
author2_role	author author author
dc.contributor.author.fl_str_mv	Luís Torgo Paula Oliveira Branco Rita Paula Ribeiro Pfahringer,B
description	Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal, we have a problem of class imbalance that was thoroughly studied within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important applications involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by resampling approaches that change the distribution of the given data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present two modifications of well-known resampling strategies for classification tasks: the under-sampling and the synthetic minority over-sampling technique (SMOTE) methods. These modifications allow the use of these strategies on regression tasks where the goal is to forecast rare extreme values of the target variable. In an extensive set of experiments, we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed resampling methods can be used with any existing regression algorithm, which means that they are general tools for addressing problems of forecasting rare extreme values of a continuous target variable.
publishDate	2015
dc.date.none.fl_str_mv	2015-01-01T00:00:00Z 2015 2017-12-21T12:24:27Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://repositorio.inesctec.pt/handle/123456789/4625 http://dx.doi.org/10.1111/exsy.12081
url	http://repositorio.inesctec.pt/handle/123456789/4625 http://dx.doi.org/10.1111/exsy.12081
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799131606111223808

Resampling strategies for regression

Registros relacionados