Detection of outliers in multivariate data: a method based on clustering and robust estimators

Detalhes bibliográficos
Autor(a) principal: Carla M. Santos Pereira
Data de Publicação: 2002
Outros Autores: Ana M. Pires
Tipo de documento: Livro
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/65794
Resumo: Outlier identification is important in many applications of multivariate analysis. Either because there is some specific interest in finding anomalous observations or as a pre-processing task before the application of some multivariate method, in order to preserve the results from possible harmful effects of those observations. It is also of great interest in supervised classification (or discriminant analysis) if, when predicting group membership, one wants to have the possibility of labelling an observation as does not belong to any of the available groups. The identification of outliers in multivariate data is usually based on Mahalanobis distance. The use of robust estimates of the mean and the covariance matrix is advised in order to avoid the masking effect (Rousseeuw and Leroy, 1985; Rousseeuw and von Zomeren, 1990; Rocke and Woodruff, 1996; Becker and Gather, 1999). However, the performance of these rules is still highly dependent of multivariate normality of the bulk of the data. The aim of the method here described is to remove this dependence.
id RCAP_6af4ded2001c1fe8cdcf128ce4039331
oai_identifier_str oai:repositorio-aberto.up.pt:10216/65794
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Detection of outliers in multivariate data: a method based on clustering and robust estimatorsCiências exactas e naturaisNatural sciencesOutlier identification is important in many applications of multivariate analysis. Either because there is some specific interest in finding anomalous observations or as a pre-processing task before the application of some multivariate method, in order to preserve the results from possible harmful effects of those observations. It is also of great interest in supervised classification (or discriminant analysis) if, when predicting group membership, one wants to have the possibility of labelling an observation as does not belong to any of the available groups. The identification of outliers in multivariate data is usually based on Mahalanobis distance. The use of robust estimates of the mean and the covariance matrix is advised in order to avoid the masking effect (Rousseeuw and Leroy, 1985; Rousseeuw and von Zomeren, 1990; Rocke and Woodruff, 1996; Becker and Gather, 1999). However, the performance of these rules is still highly dependent of multivariate normality of the bulk of the data. The aim of the method here described is to remove this dependence.20022002-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://hdl.handle.net/10216/65794eng10.1007/978-3-642-57489-4_41Carla M. Santos PereiraAna M. Piresinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T12:41:44Zoai:repositorio-aberto.up.pt:10216/65794Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:24:57.290044Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Detection of outliers in multivariate data: a method based on clustering and robust estimators
title Detection of outliers in multivariate data: a method based on clustering and robust estimators
spellingShingle Detection of outliers in multivariate data: a method based on clustering and robust estimators
Carla M. Santos Pereira
Ciências exactas e naturais
Natural sciences
title_short Detection of outliers in multivariate data: a method based on clustering and robust estimators
title_full Detection of outliers in multivariate data: a method based on clustering and robust estimators
title_fullStr Detection of outliers in multivariate data: a method based on clustering and robust estimators
title_full_unstemmed Detection of outliers in multivariate data: a method based on clustering and robust estimators
title_sort Detection of outliers in multivariate data: a method based on clustering and robust estimators
author Carla M. Santos Pereira
author_facet Carla M. Santos Pereira
Ana M. Pires
author_role author
author2 Ana M. Pires
author2_role author
dc.contributor.author.fl_str_mv Carla M. Santos Pereira
Ana M. Pires
dc.subject.por.fl_str_mv Ciências exactas e naturais
Natural sciences
topic Ciências exactas e naturais
Natural sciences
description Outlier identification is important in many applications of multivariate analysis. Either because there is some specific interest in finding anomalous observations or as a pre-processing task before the application of some multivariate method, in order to preserve the results from possible harmful effects of those observations. It is also of great interest in supervised classification (or discriminant analysis) if, when predicting group membership, one wants to have the possibility of labelling an observation as does not belong to any of the available groups. The identification of outliers in multivariate data is usually based on Mahalanobis distance. The use of robust estimates of the mean and the covariance matrix is advised in order to avoid the masking effect (Rousseeuw and Leroy, 1985; Rousseeuw and von Zomeren, 1990; Rocke and Woodruff, 1996; Becker and Gather, 1999). However, the performance of these rules is still highly dependent of multivariate normality of the bulk of the data. The aim of the method here described is to remove this dependence.
publishDate 2002
dc.date.none.fl_str_mv 2002
2002-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/book
format book
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/65794
url https://hdl.handle.net/10216/65794
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 10.1007/978-3-642-57489-4_41
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135554120450048