Federated Learning (FL) enables the training of Artificial Intelligence (AI) models directly on edge devices, such as those used in Internet of Medical Things (IoMT) scenarios, without transferring sensitive patient data to centralized servers, ensuring compliance with healthcare privacy regulations. However, the inherently non-Independent and Identically Distributed (non-IID) nature of data generated by IoMT devices poses significant challenges to effective model training, including delayed or unstable convergence. This study evaluates a methodology for generating realistic non-IID datasets from existing centralized healthcare datasets to support the evaluation of FL strategies under real-world heterogeneity. Using the FedArtML dataset generation tool, we simulate varying degrees of data distribution skew through advanced statistical partitioning techniques. This enables controlled experimentation and benchmarking of FL performance in healthcare scenarios, such as electrocardiogram (ECG) arrhythmia detection. In a use case based on the MIT-BIH Arrhythmia dataset, we assess the effects of different non-IID conditions on model accuracy and computational workload across clients. While accuracy remains stable (with minimal degradation), extreme non-IID settings lead to substantial variability in training times. These findings demonstrate that controlled dataset generation using FedArtML enables realistic FL evaluations and provides insights into the operational challenges of deploying FL in clinical environments.

Modeling and analyzing non-IID data in federated learning based ECG arrhythmia detection scenarios

Davide Cantoro;Angela-Tafadzwa Shumba;Gianluigi Semeraro;Mattia Cotardo;Davide Rollo;Teodoro Montanaro;Ilaria Sergi;Massimo De Vittorio;Luigi Patrono
2025-01-01

Abstract

Federated Learning (FL) enables the training of Artificial Intelligence (AI) models directly on edge devices, such as those used in Internet of Medical Things (IoMT) scenarios, without transferring sensitive patient data to centralized servers, ensuring compliance with healthcare privacy regulations. However, the inherently non-Independent and Identically Distributed (non-IID) nature of data generated by IoMT devices poses significant challenges to effective model training, including delayed or unstable convergence. This study evaluates a methodology for generating realistic non-IID datasets from existing centralized healthcare datasets to support the evaluation of FL strategies under real-world heterogeneity. Using the FedArtML dataset generation tool, we simulate varying degrees of data distribution skew through advanced statistical partitioning techniques. This enables controlled experimentation and benchmarking of FL performance in healthcare scenarios, such as electrocardiogram (ECG) arrhythmia detection. In a use case based on the MIT-BIH Arrhythmia dataset, we assess the effects of different non-IID conditions on model accuracy and computational workload across clients. While accuracy remains stable (with minimal degradation), extreme non-IID settings lead to substantial variability in training times. These findings demonstrate that controlled dataset generation using FedArtML enables realistic FL evaluations and provides insights into the operational challenges of deploying FL in clinical environments.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11587/565547
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact