The rapid growth of data has created significant challenges in managing and leveraging data effectively. Data engineering has emerged as a crucial discipline to address these challenges, providing frameworks for efficient data management. Data Engineering Patterns (DEP) and Data Engineering Design Patterns (DEDP) offer standardized practices and best practice solutions for data engineering tasks like ETL. While various DEPs and DEDPs exist, the issue of high-volume data ingestion remains insufficiently addressed. This paper focuses on defining design patterns specifically for data ingestion techniques within cloud-based architectures, covering both incremental and full refresh methods. The proposed approach utilizes a flexible, metadata-driven framework to enhance adaptability and ease of use, allowing for seamless changes to the ingestion type, schema updates, table additions, and incorporation of new data sources. Validated on the Azure cloud platform, the experiments demonstrate that the proposed design patterns significantly reduce data ingestion time, contributing to the field of data management by addressing key challenges in high-volume data processing.

Optimizing Data Ingestion for Big Data: A Cloud-Based Design Pattern Approach

Rucco Chiara
Primo
Membro del Collaboration Group
;
Longo Antonella
Secondo
Supervision
;
Saad Motaz
Ultimo
Supervision
2024-01-01

Abstract

The rapid growth of data has created significant challenges in managing and leveraging data effectively. Data engineering has emerged as a crucial discipline to address these challenges, providing frameworks for efficient data management. Data Engineering Patterns (DEP) and Data Engineering Design Patterns (DEDP) offer standardized practices and best practice solutions for data engineering tasks like ETL. While various DEPs and DEDPs exist, the issue of high-volume data ingestion remains insufficiently addressed. This paper focuses on defining design patterns specifically for data ingestion techniques within cloud-based architectures, covering both incremental and full refresh methods. The proposed approach utilizes a flexible, metadata-driven framework to enhance adaptability and ease of use, allowing for seamless changes to the ingestion type, schema updates, table additions, and incorporation of new data sources. Validated on the Azure cloud platform, the experiments demonstrate that the proposed design patterns significantly reduce data ingestion time, contributing to the field of data management by addressing key challenges in high-volume data processing.
File in questo prodotto:
File Dimensione Formato  
Optimizing_Data_Ingestion_for_Big_Data_A_Cloud-Based_Design_Pattern_Approach.pdf

solo utenti autorizzati

Tipologia: Versione editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 567.32 kB
Formato Adobe PDF
567.32 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11587/561286
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact