The rapid growth of data has created significant challenges in managing and leveraging data effectively. Data engineering has emerged as a crucial discipline to address these challenges, providing frameworks for efficient data management. Data Engineering Patterns (DEP) and Data Engineering Design Patterns (DEDP) offer standardized practices and best practice solutions for data engineering tasks like ETL. While various DEPs and DEDPs exist, the issue of high-volume data ingestion remains insufficiently addressed. This paper focuses on defining design patterns specifically for data ingestion techniques within cloud-based architectures, covering both incremental and full refresh methods. The proposed approach utilizes a flexible, metadata-driven framework to enhance adaptability and ease of use, allowing for seamless changes to the ingestion type, schema updates, table additions, and incorporation of new data sources. Validated on the Azure cloud platform, the experiments demonstrate that the proposed design patterns significantly reduce data ingestion time, contributing to the field of data management by addressing key challenges in high-volume data processing.
Optimizing Data Ingestion for Big Data: A Cloud-Based Design Pattern Approach
Rucco Chiara
Primo
Membro del Collaboration Group
;Longo AntonellaSecondo
Supervision
;Saad MotazUltimo
Supervision
2024-01-01
Abstract
The rapid growth of data has created significant challenges in managing and leveraging data effectively. Data engineering has emerged as a crucial discipline to address these challenges, providing frameworks for efficient data management. Data Engineering Patterns (DEP) and Data Engineering Design Patterns (DEDP) offer standardized practices and best practice solutions for data engineering tasks like ETL. While various DEPs and DEDPs exist, the issue of high-volume data ingestion remains insufficiently addressed. This paper focuses on defining design patterns specifically for data ingestion techniques within cloud-based architectures, covering both incremental and full refresh methods. The proposed approach utilizes a flexible, metadata-driven framework to enhance adaptability and ease of use, allowing for seamless changes to the ingestion type, schema updates, table additions, and incorporation of new data sources. Validated on the Azure cloud platform, the experiments demonstrate that the proposed design patterns significantly reduce data ingestion time, contributing to the field of data management by addressing key challenges in high-volume data processing.| File | Dimensione | Formato | |
|---|---|---|---|
|
Optimizing_Data_Ingestion_for_Big_Data_A_Cloud-Based_Design_Pattern_Approach.pdf
solo utenti autorizzati
Tipologia:
Versione editoriale
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
567.32 kB
Formato
Adobe PDF
|
567.32 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


