Federated Learning (FL) enables the training of Artificial Intelligence (AI) models directly on edge devices, such as those used in Internet of Medical Things (IoMT) scenarios, without transferring sensitive patient data to centralized servers, ensuring compliance with healthcare privacy regulations. However, the inherently non-Independent and Identically Distributed (non-IID) nature of data generated by IoMT devices poses significant challenges to effective model training, including delayed or unstable convergence. This study evaluates a methodology for generating realistic non-IID datasets from existing centralized healthcare datasets to support the evaluation of FL strategies under real-world heterogeneity. Using the FedArtML dataset generation tool, we simulate varying degrees of data distribution skew through advanced statistical partitioning techniques. This enables controlled experimentation and benchmarking of FL performance in healthcare scenarios, such as electrocardiogram (ECG) arrhythmia detection. In a use case based on the MIT-BIH Arrhythmia dataset, we assess the effects of different non-IID conditions on model accuracy and computational workload across clients. While accuracy remains stable (with minimal degradation), extreme non-IID settings lead to substantial variability in training times. These findings demonstrate that controlled dataset generation using FedArtML enables realistic FL evaluations and provides insights into the operational challenges of deploying FL in clinical environments.
Modeling and analyzing non-IID data in federated learning based ECG arrhythmia detection scenarios
Davide Cantoro;Angela-Tafadzwa Shumba;Gianluigi Semeraro;Mattia Cotardo;Davide Rollo;Teodoro Montanaro;Ilaria Sergi;Massimo De Vittorio;Luigi Patrono
2025-01-01
Abstract
Federated Learning (FL) enables the training of Artificial Intelligence (AI) models directly on edge devices, such as those used in Internet of Medical Things (IoMT) scenarios, without transferring sensitive patient data to centralized servers, ensuring compliance with healthcare privacy regulations. However, the inherently non-Independent and Identically Distributed (non-IID) nature of data generated by IoMT devices poses significant challenges to effective model training, including delayed or unstable convergence. This study evaluates a methodology for generating realistic non-IID datasets from existing centralized healthcare datasets to support the evaluation of FL strategies under real-world heterogeneity. Using the FedArtML dataset generation tool, we simulate varying degrees of data distribution skew through advanced statistical partitioning techniques. This enables controlled experimentation and benchmarking of FL performance in healthcare scenarios, such as electrocardiogram (ECG) arrhythmia detection. In a use case based on the MIT-BIH Arrhythmia dataset, we assess the effects of different non-IID conditions on model accuracy and computational workload across clients. While accuracy remains stable (with minimal degradation), extreme non-IID settings lead to substantial variability in training times. These findings demonstrate that controlled dataset generation using FedArtML enables realistic FL evaluations and provides insights into the operational challenges of deploying FL in clinical environments.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


