We present FQN (Fast Qn), a novel algorithm for online computation of the Qn scale estimator. The algorithm works in the sliding window model, cleverly computing the Qn scale estimator in the current window. We thoroughly compare our algorithm for online Qn with the state of the art competing algorithm by Nunkesser et al., and show that FQN (i) is faster, requiring only O(s) time in the worst case where s is the length of the window (ii) its computational complexity does not depend on the input distribution and (iii) it requires less space. To the best of our knowledge, our algorithm is the first that allows online computation of the Qn scale estimator in worst case time linear in the size of the window. As an example of a possible application, besides its use as a robust measure of statistical dispersion, we show how to use the Qn estimator for fast detection of outliers in data streams. Extensive experimental results on both synthetic and real datasets confirm the validity of our approach.
Fast online computation of the Qn estimator with applications to the detection of outliers in data streams
Cafaro M.
;Melle C.;Pulimeno M.;Epicoco I.
2021-01-01
Abstract
We present FQN (Fast Qn), a novel algorithm for online computation of the Qn scale estimator. The algorithm works in the sliding window model, cleverly computing the Qn scale estimator in the current window. We thoroughly compare our algorithm for online Qn with the state of the art competing algorithm by Nunkesser et al., and show that FQN (i) is faster, requiring only O(s) time in the worst case where s is the length of the window (ii) its computational complexity does not depend on the input distribution and (iii) it requires less space. To the best of our knowledge, our algorithm is the first that allows online computation of the Qn scale estimator in worst case time linear in the size of the window. As an example of a possible application, besides its use as a robust measure of statistical dispersion, we show how to use the Qn estimator for fast detection of outliers in data streams. Extensive experimental results on both synthetic and real datasets confirm the validity of our approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.