The NoiseFiltersR Package: Label Noise Preprocessing in R
Autor(a) principal: | |
---|---|
Data de Publicação: | 2017 |
Outros Autores: | , , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UNIFESP |
Texto Completo: | https://journal.r-project.org/archive/2017/RJ-2017-027/index.html https://repositorio.unifesp.br/handle/11600/53705 |
Resumo: | In Data Mining, the value of extracted knowledge is directly related to the quality of the used data. This makes data preprocessing one of the most important steps in the knowledge discovery process. A common problem affecting data quality is the presence of noise. A training set with label noise can reduce the predictive performance of classification learning techniques and increase the overfitting of classification models. In this work we present the NoiseFiltersR package. It contains the first extensive R implementation of classical and state-of-the-art label noise filters, which are the most common techniques for preprocessing label noise. The algorithms used for the implementation of the label noise filters are appropriately documented and referenced. They can be called in a R-user-friendly manner, and their results are unified by means of the "filter" class, which also benefits from adapted print and summary methods. |
id |
UFSP_5441acd8d3b7c478639dc4a2dbafced8 |
---|---|
oai_identifier_str |
oai:repositorio.unifesp.br:11600/53705 |
network_acronym_str |
UFSP |
network_name_str |
Repositório Institucional da UNIFESP |
repository_id_str |
3465 |
spelling |
Morales, PabloLuengo, JulianGarcia, Luis P. F.Lorena, Ana C. [UNIFESP]de Carvalho, Andre C. P. L. F.Herrera, Francisco2020-06-26T16:30:42Z2020-06-26T16:30:42Z2017https://journal.r-project.org/archive/2017/RJ-2017-027/index.htmlR Journal. Wien, v. 9, n. 1, p. 219-228, 2017.2073-4859https://repositorio.unifesp.br/handle/11600/53705WOS000404756200015.pdfWOS:000404756200015In Data Mining, the value of extracted knowledge is directly related to the quality of the used data. This makes data preprocessing one of the most important steps in the knowledge discovery process. A common problem affecting data quality is the presence of noise. A training set with label noise can reduce the predictive performance of classification learning techniques and increase the overfitting of classification models. In this work we present the NoiseFiltersR package. It contains the first extensive R implementation of classical and state-of-the-art label noise filters, which are the most common techniques for preprocessing label noise. The algorithms used for the implementation of the label noise filters are appropriately documented and referenced. They can be called in a R-user-friendly manner, and their results are unified by means of the "filter" class, which also benefits from adapted print and summary methods.Spanish Research ProjectAndalusian Research PlanBrazilian grant-CeMEAI-FAPESPFAPESPUniv Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, SpainUniv Sao Paulo, Inst Ciencias Matemat & Comp, Trabalhador Sao Carlense Av 400, BR-13560970 Sao Carlos, SP, BrazilUniv Fed Sao Paulo, Inst Ciencia & Tecnol, Talim St 330, BR-12231280 Sao Jose Dos Campos, SP, BrazilUniv Fed Sao Paulo, Inst Ciencia & Tecnol, Talim St 330, BR-12231280 Sao Jose Dos Campos, SP, BrazilSpanish Research Project: TIN2014-57251-PAndalusian Research Plan: P11-TIC-7765CeMEAI-FAPESP: 2013/07375-0FAPESP: 2012/22608-8FAPESP: 2011/14602-7Web of Science219-228engR Foundation Statistical ComputingR JournalThe NoiseFiltersR Package: Label Noise Preprocessing in Rinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleWien91info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UNIFESPinstname:Universidade Federal de São Paulo (UNIFESP)instacron:UNIFESPORIGINALWOS000404756200015.pdfapplication/pdf180874${dspace.ui.url}/bitstream/11600/53705/1/WOS000404756200015.pdf4c76642e1f9549f8c102674a2998f959MD51open accessTEXTWOS000404756200015.pdf.txtWOS000404756200015.pdf.txtExtracted texttext/plain32117${dspace.ui.url}/bitstream/11600/53705/8/WOS000404756200015.pdf.txtc8561eb53e470fc810dbdd2d18f48c9eMD58open accessTHUMBNAILWOS000404756200015.pdf.jpgWOS000404756200015.pdf.jpgIM Thumbnailimage/jpeg6229${dspace.ui.url}/bitstream/11600/53705/10/WOS000404756200015.pdf.jpg17b115e2e7f69b83e4b2255169cfecb7MD510open access11600/537052023-06-05 19:10:27.82open accessoai:repositorio.unifesp.br:11600/53705Repositório InstitucionalPUBhttp://www.repositorio.unifesp.br/oai/requestopendoar:34652023-06-05T22:10:27Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP)false |
dc.title.en.fl_str_mv |
The NoiseFiltersR Package: Label Noise Preprocessing in R |
title |
The NoiseFiltersR Package: Label Noise Preprocessing in R |
spellingShingle |
The NoiseFiltersR Package: Label Noise Preprocessing in R Morales, Pablo |
title_short |
The NoiseFiltersR Package: Label Noise Preprocessing in R |
title_full |
The NoiseFiltersR Package: Label Noise Preprocessing in R |
title_fullStr |
The NoiseFiltersR Package: Label Noise Preprocessing in R |
title_full_unstemmed |
The NoiseFiltersR Package: Label Noise Preprocessing in R |
title_sort |
The NoiseFiltersR Package: Label Noise Preprocessing in R |
author |
Morales, Pablo |
author_facet |
Morales, Pablo Luengo, Julian Garcia, Luis P. F. Lorena, Ana C. [UNIFESP] de Carvalho, Andre C. P. L. F. Herrera, Francisco |
author_role |
author |
author2 |
Luengo, Julian Garcia, Luis P. F. Lorena, Ana C. [UNIFESP] de Carvalho, Andre C. P. L. F. Herrera, Francisco |
author2_role |
author author author author author |
dc.contributor.author.fl_str_mv |
Morales, Pablo Luengo, Julian Garcia, Luis P. F. Lorena, Ana C. [UNIFESP] de Carvalho, Andre C. P. L. F. Herrera, Francisco |
description |
In Data Mining, the value of extracted knowledge is directly related to the quality of the used data. This makes data preprocessing one of the most important steps in the knowledge discovery process. A common problem affecting data quality is the presence of noise. A training set with label noise can reduce the predictive performance of classification learning techniques and increase the overfitting of classification models. In this work we present the NoiseFiltersR package. It contains the first extensive R implementation of classical and state-of-the-art label noise filters, which are the most common techniques for preprocessing label noise. The algorithms used for the implementation of the label noise filters are appropriately documented and referenced. They can be called in a R-user-friendly manner, and their results are unified by means of the "filter" class, which also benefits from adapted print and summary methods. |
publishDate |
2017 |
dc.date.issued.fl_str_mv |
2017 |
dc.date.accessioned.fl_str_mv |
2020-06-26T16:30:42Z |
dc.date.available.fl_str_mv |
2020-06-26T16:30:42Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.].fl_str_mv |
https://journal.r-project.org/archive/2017/RJ-2017-027/index.html |
dc.identifier.citation.fl_str_mv |
R Journal. Wien, v. 9, n. 1, p. 219-228, 2017. |
dc.identifier.uri.fl_str_mv |
https://repositorio.unifesp.br/handle/11600/53705 |
dc.identifier.issn.none.fl_str_mv |
2073-4859 |
dc.identifier.file.none.fl_str_mv |
WOS000404756200015.pdf |
dc.identifier.wos.none.fl_str_mv |
WOS:000404756200015 |
url |
https://journal.r-project.org/archive/2017/RJ-2017-027/index.html https://repositorio.unifesp.br/handle/11600/53705 |
identifier_str_mv |
R Journal. Wien, v. 9, n. 1, p. 219-228, 2017. 2073-4859 WOS000404756200015.pdf WOS:000404756200015 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.ispartof.none.fl_str_mv |
R Journal |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
219-228 |
dc.coverage.none.fl_str_mv |
Wien |
dc.publisher.none.fl_str_mv |
R Foundation Statistical Computing |
publisher.none.fl_str_mv |
R Foundation Statistical Computing |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UNIFESP instname:Universidade Federal de São Paulo (UNIFESP) instacron:UNIFESP |
instname_str |
Universidade Federal de São Paulo (UNIFESP) |
instacron_str |
UNIFESP |
institution |
UNIFESP |
reponame_str |
Repositório Institucional da UNIFESP |
collection |
Repositório Institucional da UNIFESP |
bitstream.url.fl_str_mv |
${dspace.ui.url}/bitstream/11600/53705/1/WOS000404756200015.pdf ${dspace.ui.url}/bitstream/11600/53705/8/WOS000404756200015.pdf.txt ${dspace.ui.url}/bitstream/11600/53705/10/WOS000404756200015.pdf.jpg |
bitstream.checksum.fl_str_mv |
4c76642e1f9549f8c102674a2998f959 c8561eb53e470fc810dbdd2d18f48c9e 17b115e2e7f69b83e4b2255169cfecb7 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UNIFESP - Universidade Federal de São Paulo (UNIFESP) |
repository.mail.fl_str_mv |
|
_version_ |
1802764267225612288 |