The influence of feature grouping algorithm in outlier detection with categorical data
Autor(a) principal: | |
---|---|
Data de Publicação: | 2024 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Acta scientiarum. Technology (Online) |
Texto Completo: | http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/66902 |
Resumo: | Outlier mining has become a rapidly developing domain over the recent years with increasing importance in the fields like banking, sensor networks, and health care. In general, anomaly detection methods are compatible with numerical data and ignore categorical data. However, in real-time problems, both numerical and categorical data are to be considered to obtain accurate results. There are several methods available for the outlier detection of high dimensional data in numerical data. In this paper, a feature grouping algorithm for anomaly detection is proposed that considers the categorical data also. This algorithm correlates the features of categorical data and forms feature clusters and detects the outliers. The features are assigned feature weights based on their levels of appearance and the outlier scores are determined. The performance of the feature grouping algorithm is then compared with the traditional algorithms like LOF and Isolation Forest algorithm and state-of-the-art methods like WATCH on UCI datasets. From the experimental evaluation of the results obtained, it is found that the proposed algorithm is comparatively better than the existing algorithms for categorical data. |
id |
UEM-6_00907021396737acf4070ca18a5b953d |
---|---|
oai_identifier_str |
oai:periodicos.uem.br/ojs:article/66902 |
network_acronym_str |
UEM-6 |
network_name_str |
Acta scientiarum. Technology (Online) |
repository_id_str |
|
spelling |
The influence of feature grouping algorithm in outlier detection with categorical data The influence of feature grouping algorithm in outlier detection with categorical data outlier detection; feature grouping; categorical data; lof; isolation forest.outlier detection; feature grouping; categorical data; lof; isolation forest.Outlier mining has become a rapidly developing domain over the recent years with increasing importance in the fields like banking, sensor networks, and health care. In general, anomaly detection methods are compatible with numerical data and ignore categorical data. However, in real-time problems, both numerical and categorical data are to be considered to obtain accurate results. There are several methods available for the outlier detection of high dimensional data in numerical data. In this paper, a feature grouping algorithm for anomaly detection is proposed that considers the categorical data also. This algorithm correlates the features of categorical data and forms feature clusters and detects the outliers. The features are assigned feature weights based on their levels of appearance and the outlier scores are determined. The performance of the feature grouping algorithm is then compared with the traditional algorithms like LOF and Isolation Forest algorithm and state-of-the-art methods like WATCH on UCI datasets. From the experimental evaluation of the results obtained, it is found that the proposed algorithm is comparatively better than the existing algorithms for categorical data.Outlier mining has become a rapidly developing domain over the recent years with increasing importance in the fields like banking, sensor networks, and health care. In general, anomaly detection methods are compatible with numerical data and ignore categorical data. However, in real-time problems, both numerical and categorical data are to be considered to obtain accurate results. There are several methods available for the outlier detection of high dimensional data in numerical data. In this paper, a feature grouping algorithm for anomaly detection is proposed that considers the categorical data also. This algorithm correlates the features of categorical data and forms feature clusters and detects the outliers. The features are assigned feature weights based on their levels of appearance and the outlier scores are determined. The performance of the feature grouping algorithm is then compared with the traditional algorithms like LOF and Isolation Forest algorithm and state-of-the-art methods like WATCH on UCI datasets. From the experimental evaluation of the results obtained, it is found that the proposed algorithm is comparatively better than the existing algorithms for categorical data.Universidade Estadual De Maringá2024-04-17info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/6690210.4025/actascitechnol.v46i1.66902Acta Scientiarum. Technology; Vol 46 No 1 (2024): Em proceso; e66902Acta Scientiarum. Technology; v. 46 n. 1 (2024): Publicação contínua; e669021806-25631807-8664reponame:Acta scientiarum. Technology (Online)instname:Universidade Estadual de Maringá (UEM)instacron:UEMenghttp://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/66902/751375157436Copyright (c) 2024 Acta Scientiarum. Technologyhttp://creativecommons.org/licenses/by/4.0info:eu-repo/semantics/openAccessNathaniel, Sharon Femi Paul Sunder Alwarsamy, Kala Viswanathan, Rajalakshmi Subramanian, Ganesh Vaidyanathan Veerabahu, Vidhya2024-04-17T14:13:24Zoai:periodicos.uem.br/ojs:article/66902Revistahttps://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/indexPUBhttps://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/oai||actatech@uem.br1807-86641806-2563opendoar:2024-04-17T14:13:24Acta scientiarum. Technology (Online) - Universidade Estadual de Maringá (UEM)false |
dc.title.none.fl_str_mv |
The influence of feature grouping algorithm in outlier detection with categorical data The influence of feature grouping algorithm in outlier detection with categorical data |
title |
The influence of feature grouping algorithm in outlier detection with categorical data |
spellingShingle |
The influence of feature grouping algorithm in outlier detection with categorical data Nathaniel, Sharon Femi Paul Sunder outlier detection; feature grouping; categorical data; lof; isolation forest. outlier detection; feature grouping; categorical data; lof; isolation forest. |
title_short |
The influence of feature grouping algorithm in outlier detection with categorical data |
title_full |
The influence of feature grouping algorithm in outlier detection with categorical data |
title_fullStr |
The influence of feature grouping algorithm in outlier detection with categorical data |
title_full_unstemmed |
The influence of feature grouping algorithm in outlier detection with categorical data |
title_sort |
The influence of feature grouping algorithm in outlier detection with categorical data |
author |
Nathaniel, Sharon Femi Paul Sunder |
author_facet |
Nathaniel, Sharon Femi Paul Sunder Alwarsamy, Kala Viswanathan, Rajalakshmi Subramanian, Ganesh Vaidyanathan Veerabahu, Vidhya |
author_role |
author |
author2 |
Alwarsamy, Kala Viswanathan, Rajalakshmi Subramanian, Ganesh Vaidyanathan Veerabahu, Vidhya |
author2_role |
author author author author |
dc.contributor.author.fl_str_mv |
Nathaniel, Sharon Femi Paul Sunder Alwarsamy, Kala Viswanathan, Rajalakshmi Subramanian, Ganesh Vaidyanathan Veerabahu, Vidhya |
dc.subject.por.fl_str_mv |
outlier detection; feature grouping; categorical data; lof; isolation forest. outlier detection; feature grouping; categorical data; lof; isolation forest. |
topic |
outlier detection; feature grouping; categorical data; lof; isolation forest. outlier detection; feature grouping; categorical data; lof; isolation forest. |
description |
Outlier mining has become a rapidly developing domain over the recent years with increasing importance in the fields like banking, sensor networks, and health care. In general, anomaly detection methods are compatible with numerical data and ignore categorical data. However, in real-time problems, both numerical and categorical data are to be considered to obtain accurate results. There are several methods available for the outlier detection of high dimensional data in numerical data. In this paper, a feature grouping algorithm for anomaly detection is proposed that considers the categorical data also. This algorithm correlates the features of categorical data and forms feature clusters and detects the outliers. The features are assigned feature weights based on their levels of appearance and the outlier scores are determined. The performance of the feature grouping algorithm is then compared with the traditional algorithms like LOF and Isolation Forest algorithm and state-of-the-art methods like WATCH on UCI datasets. From the experimental evaluation of the results obtained, it is found that the proposed algorithm is comparatively better than the existing algorithms for categorical data. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-04-17 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/66902 10.4025/actascitechnol.v46i1.66902 |
url |
http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/66902 |
identifier_str_mv |
10.4025/actascitechnol.v46i1.66902 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
http://www.periodicos.uem.br/ojs/index.php/ActaSciTechnol/article/view/66902/751375157436 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2024 Acta Scientiarum. Technology http://creativecommons.org/licenses/by/4.0 info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2024 Acta Scientiarum. Technology http://creativecommons.org/licenses/by/4.0 |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidade Estadual De Maringá |
publisher.none.fl_str_mv |
Universidade Estadual De Maringá |
dc.source.none.fl_str_mv |
Acta Scientiarum. Technology; Vol 46 No 1 (2024): Em proceso; e66902 Acta Scientiarum. Technology; v. 46 n. 1 (2024): Publicação contínua; e66902 1806-2563 1807-8664 reponame:Acta scientiarum. Technology (Online) instname:Universidade Estadual de Maringá (UEM) instacron:UEM |
instname_str |
Universidade Estadual de Maringá (UEM) |
instacron_str |
UEM |
institution |
UEM |
reponame_str |
Acta scientiarum. Technology (Online) |
collection |
Acta scientiarum. Technology (Online) |
repository.name.fl_str_mv |
Acta scientiarum. Technology (Online) - Universidade Estadual de Maringá (UEM) |
repository.mail.fl_str_mv |
||actatech@uem.br |
_version_ |
1799315338405347328 |