The integration of Federated Learning (FL) in the Internet of Medical Things (IoMT) represents a cutting-edge solution, enabling the training of Artificial Intelligence (AI) models directly on edge devices without the need to share sensitive patient information. This approach enhances privacy while preserving the quality and effectiveness of clinical analysis. However, in real-world scenarios, physical devices often generate data that is non-independent and non-identically distributed (Non-IID), creating significant challenges for the training process. This study proposes an experimental method to generate realistic data distributions from existing centralized datasets, capturing real-world heterogeneity in IoT-driven federated learning infrastructures. The proposed infrastructure utilizes advanced statistical techniques to transform IID datasets into Non-IID distributions. This transformation enables a systematic evaluation of the impact of Non-IID data on Federated Learning in ECG arrhythmia detection. Using the MIT-BIH Arrhythmia dataset, an accuracy drop of only 0.31% was observed in an extreme Non-IID scenario. However, significant execution time variability is observed, showing up to a 50% variation across clients, compared to medium Non-IID (15.5%) or IID (0.63%)conditions. This observation implies that Non-IID data leads to substantial disparities in computational workload across clients, which can slow down and destabilize the convergence process, as suggested by theoretical expectations.

Analyzing the Impact of Non-IID Data on IoT-Enabled Federated Learning for ECG Arrhythmia Detection

Davide Cantoro
Primo
;
Angela-Tafadzwa Shumba
Secondo
;
Gianluigi Semeraro;Teodoro Montanaro;Ilaria Sergi;Massimo De Vittorio
Penultimo
;
Luigi Patrono
Ultimo
2025-01-01

Abstract

The integration of Federated Learning (FL) in the Internet of Medical Things (IoMT) represents a cutting-edge solution, enabling the training of Artificial Intelligence (AI) models directly on edge devices without the need to share sensitive patient information. This approach enhances privacy while preserving the quality and effectiveness of clinical analysis. However, in real-world scenarios, physical devices often generate data that is non-independent and non-identically distributed (Non-IID), creating significant challenges for the training process. This study proposes an experimental method to generate realistic data distributions from existing centralized datasets, capturing real-world heterogeneity in IoT-driven federated learning infrastructures. The proposed infrastructure utilizes advanced statistical techniques to transform IID datasets into Non-IID distributions. This transformation enables a systematic evaluation of the impact of Non-IID data on Federated Learning in ECG arrhythmia detection. Using the MIT-BIH Arrhythmia dataset, an accuracy drop of only 0.31% was observed in an extreme Non-IID scenario. However, significant execution time variability is observed, showing up to a 50% variation across clients, compared to medium Non-IID (15.5%) or IID (0.63%)conditions. This observation implies that Non-IID data leads to substantial disparities in computational workload across clients, which can slow down and destabilize the convergence process, as suggested by theoretical expectations.
File in questo prodotto:
File Dimensione Formato  
PDFExpress_Splitech2025_ECG_Non_IID_DavideCantoro.pdf

solo utenti autorizzati

Descrizione: Pre-print
Tipologia: Versione editoriale
Licenza: Copyright dell'editore
Dimensione 1.51 MB
Formato Adobe PDF
1.51 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11587/551447
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact