Detection of outliers in multivariate data: a method based on clustering and robust estimators

Carla M. Santos Pereira; Ana M. Pires

Detection of outliers in multivariate data: a method based on clustering and robust estimators

Detalhes bibliográficos
Autor(a) principal:	Carla M. Santos Pereira
Data de Publicação:	2002
Outros Autores:	Ana M. Pires
Tipo de documento:	Livro
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://hdl.handle.net/10216/65794
Resumo:	Outlier identification is important in many applications of multivariate analysis. Either because there is some specific interest in finding anomalous observations or as a pre-processing task before the application of some multivariate method, in order to preserve the results from possible harmful effects of those observations. It is also of great interest in supervised classification (or discriminant analysis) if, when predicting group membership, one wants to have the possibility of labelling an observation as does not belong to any of the available groups. The identification of outliers in multivariate data is usually based on Mahalanobis distance. The use of robust estimates of the mean and the covariance matrix is advised in order to avoid the masking effect (Rousseeuw and Leroy, 1985; Rousseeuw and von Zomeren, 1990; Rocke and Woodruff, 1996; Becker and Gather, 1999). However, the performance of these rules is still highly dependent of multivariate normality of the bulk of the data. The aim of the method here described is to remove this dependence.

Metadados do item

id	RCAP_6af4ded2001c1fe8cdcf128ce4039331
oai_identifier_str	oai:repositorio-aberto.up.pt:10216/65794
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	Detection of outliers in multivariate data: a method based on clustering and robust estimatorsCiências exactas e naturaisNatural sciencesOutlier identification is important in many applications of multivariate analysis. Either because there is some specific interest in finding anomalous observations or as a pre-processing task before the application of some multivariate method, in order to preserve the results from possible harmful effects of those observations. It is also of great interest in supervised classification (or discriminant analysis) if, when predicting group membership, one wants to have the possibility of labelling an observation as does not belong to any of the available groups. The identification of outliers in multivariate data is usually based on Mahalanobis distance. The use of robust estimates of the mean and the covariance matrix is advised in order to avoid the masking effect (Rousseeuw and Leroy, 1985; Rousseeuw and von Zomeren, 1990; Rocke and Woodruff, 1996; Becker and Gather, 1999). However, the performance of these rules is still highly dependent of multivariate normality of the bulk of the data. The aim of the method here described is to remove this dependence.20022002-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/pdfhttps://hdl.handle.net/10216/65794eng10.1007/978-3-642-57489-4_41Carla M. Santos PereiraAna M. Piresinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T12:41:44Zoai:repositorio-aberto.up.pt:10216/65794Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:24:57.290044Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	Detection of outliers in multivariate data: a method based on clustering and robust estimators
title	Detection of outliers in multivariate data: a method based on clustering and robust estimators
spellingShingle	Detection of outliers in multivariate data: a method based on clustering and robust estimators Carla M. Santos Pereira Ciências exactas e naturais Natural sciences
title_short	Detection of outliers in multivariate data: a method based on clustering and robust estimators
title_full	Detection of outliers in multivariate data: a method based on clustering and robust estimators
title_fullStr	Detection of outliers in multivariate data: a method based on clustering and robust estimators
title_full_unstemmed	Detection of outliers in multivariate data: a method based on clustering and robust estimators
title_sort	Detection of outliers in multivariate data: a method based on clustering and robust estimators
author	Carla M. Santos Pereira
author_facet	Carla M. Santos Pereira Ana M. Pires
author_role	author
author2	Ana M. Pires
author2_role	author
dc.contributor.author.fl_str_mv	Carla M. Santos Pereira Ana M. Pires
dc.subject.por.fl_str_mv	Ciências exactas e naturais Natural sciences
topic	Ciências exactas e naturais Natural sciences
description	Outlier identification is important in many applications of multivariate analysis. Either because there is some specific interest in finding anomalous observations or as a pre-processing task before the application of some multivariate method, in order to preserve the results from possible harmful effects of those observations. It is also of great interest in supervised classification (or discriminant analysis) if, when predicting group membership, one wants to have the possibility of labelling an observation as does not belong to any of the available groups. The identification of outliers in multivariate data is usually based on Mahalanobis distance. The use of robust estimates of the mean and the covariance matrix is advised in order to avoid the masking effect (Rousseeuw and Leroy, 1985; Rousseeuw and von Zomeren, 1990; Rocke and Woodruff, 1996; Becker and Gather, 1999). However, the performance of these rules is still highly dependent of multivariate normality of the bulk of the data. The aim of the method here described is to remove this dependence.
publishDate	2002
dc.date.none.fl_str_mv	2002 2002-01-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/book
format	book
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/10216/65794
url	https://hdl.handle.net/10216/65794
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	10.1007/978-3-642-57489-4_41
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799135554120450048

Detection of outliers in multivariate data: a method based on clustering and robust estimators

Registros relacionados