Optimizing Data Ingestion for Big Data: A Cloud-Based Design Pattern Approach

Rucco, Chiara; Longo, Antonella; Saad, Motaz K H

doi:10.1109/BigData62323.2024.10825970

The rapid growth of data has created significant challenges in managing and leveraging data effectively. Data engineering has emerged as a crucial discipline to address these challenges, providing frameworks for efficient data management. Data Engineering Patterns (DEP) and Data Engineering Design Patterns (DEDP) offer standardized practices and best practice solutions for data engineering tasks like ETL. While various DEPs and DEDPs exist, the issue of high-volume data ingestion remains insufficiently addressed. This paper focuses on defining design patterns specifically for data ingestion techniques within cloud-based architectures, covering both incremental and full refresh methods. The proposed approach utilizes a flexible, metadata-driven framework to enhance adaptability and ease of use, allowing for seamless changes to the ingestion type, schema updates, table additions, and incorporation of new data sources. Validated on the Azure cloud platform, the experiments demonstrate that the proposed design patterns significantly reduce data ingestion time, contributing to the field of data management by addressing key challenges in high-volume data processing.