Many real-world problems deal with collections of high-dimensional data, i.e., data with many different features. A dataset exhibiting a high number of features incurs the so-called curse of dimensionality: when the dimensionality increases, the volume of the space increases at a fast rate, causing the sparseness of the data. This makes challenging clustering high-dimensional data for outlier detection purposes. In this paper, we design and implement a distributed peer to peer version of an algorithm that addresses the curse of dimensionality by generating candidate subspaces from the high-dimensional space through Principal Component Analysis. The experimental results show that if the parameters of the distributed algorithm are properly set, then the distributed algorithm converges to the results provided by the sequential algorithm, which is a fundamental and highly desirable property.
An Adaptive Clustering Approach for Distributed Outlier Detection in Data Streams
Cafaro M.;Pulimeno M.;Epicoco I.
2023-01-01
Abstract
Many real-world problems deal with collections of high-dimensional data, i.e., data with many different features. A dataset exhibiting a high number of features incurs the so-called curse of dimensionality: when the dimensionality increases, the volume of the space increases at a fast rate, causing the sparseness of the data. This makes challenging clustering high-dimensional data for outlier detection purposes. In this paper, we design and implement a distributed peer to peer version of an algorithm that addresses the curse of dimensionality by generating candidate subspaces from the high-dimensional space through Principal Component Analysis. The experimental results show that if the parameters of the distributed algorithm are properly set, then the distributed algorithm converges to the results provided by the sequential algorithm, which is a fundamental and highly desirable property.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.